Course:Cons452/UsingR

From UBC Wiki

What is R

R studio relies on the R programming language and uses it to write statistical programs. R studio can then be used to perform statistical analysis. R is the engine, while R studio is like the dashboard.

R is a programming language and a open source software environment for statistical computing, and is widely used for data analysis. R allows an extraordinary range of statistical calculations. It is a free program, mainly written by voluntary contributions from statisticians around the world. R has its home page at https://www.r-project.org/.

For this course, to utilize R for various statistical analysis, we will use R Studio, which is an Integrated Development Environment (IDE). There are other IDEs available for running R but R Studio is the most popularly used. The picture below depicts the main difference between the two.

Installing R and R Studio Desktop

You need to install both R and R studio on your computer. First, you should install R, followed by R Studio. We will be relying on R Studio Desktop version (it also has a cloud version called R Studio Server)

R R Studio
Download:
  1. Go to https://cran.r-project.org/
  2. Click on download links on the top of the page
    • For Mac OS, click on Download R for (Mac) OS X
      • Click on the latest .pkg file e.g. R-3.6.1.pkg
    • For Windows, click on Download R for Windows
      • Click on install R for the first time
      • Click on the top most Download link
  3. Click on the downloaded files and follow the installation instructions.
Download:
  1. Go to https://rstudio.com/products/rstudio/download/
  2. Click and download the appropriate file depending on your operating system.
  3. Click on the downloaded files and follow the installation instructions.

After installing, access R Studio like you would access any other application on your computer. It may be useful to add a desktop shortcut for easy access.

Getting familiar

Data types and Data structures[1]

Everything in R is an object.

R has basic 6 data types:

  • character: "cons452", "lab"
  • numeric: 2, 13.4
  • integer: 3L (L is for telling R to store this as an integer)
  • logical: True, False
  • complex:3+4i

A simple object in R could be a collection of elements - e.g. sequence of numbers. When all elements are of the same data type, it is called a vector (more specifically atomic vector). Vector is the simplest data structure in R. R data structures include:

  • atomic vector
  • list
  • matrix
  • data frame
  • factors

For the purpose of CONS 452, data frame is the most relevant data structure. A typical data file (a spreadsheet where columns represent different variables and the rows involve observations), resembles a Data frame in R.

Navigating the Interface

Rstudio has 4 main sections to the interface

  1. The Editor : this is where you will write your code, this will save as a script file on your computer.
  2. The Console: the console is where the codes are entered when you run the written script and the output is printed.
  3. Environment & History: provides a list of datasets loaded and the history of commands used.
  4. Files, Plots, Packages & Help: this section will help you keep track of data, packages, and plots produced.
R Studio Interface

Setting A Working Directory

  • Before you start importing data, installing packages and exploring your data - you will have to set your working directory.
  • The files on your computers are organized hierarchically into folders, or “directories.” It is convenient in RStudio to tell R which directory to look for files at the beginning of a session, to minimize typing later. This is essentially setting up a folder path.
  • To set the working directory for RStudio from the “Session” tab in the menu bar, choose “Set Working Directory”, and then “Choose Directory...” This will open a dialog box that will let you find and select the directory you want. It is also possible to type the code in the editor to set the working directory.
Setting a Working Directory in R
  • !! WARNING !! When using a Mac your file path will have forward slashes (/) and when using Windows your file path will have backslashes (\). Keep this in mind, when switching working computers or while working on scripts with other people!

Installing Packages

  • To perform a particular task, there are numerous approaches within R - they are linked to various packages. Follow these instructions to download a package of choice. You have the option to download a package through written code in your script of manually navigating the interface.
What is an R package?
  • Follow this tutorial to install a package using the editor through written script: How to Install Packages in R
  • Follow these steps to download a package using R interface:
Downloading R packages

Functions and Variables

  • Most of the work in R is done by functions. A function has a name and one or more arguments. For example, log(4) is a function that calculates the log in base e for the value 4 given as input. Different packages offer you access to new functions!
  • In R, we can store information of various sorts by assigning them to variables. For example, if we want to create a variable called x and give it a value of 4, we would write " x <- 4" , after running this command , whenever we use x in a command it would be replaced by its value 4.

Naming Variables

Naming variables and functions in R is pretty flexible. Here is a list of important things to remember when naming variables:

  • A name has to start with a letter, but can be followed by letters or numbers.
  • There can't be any spaces
  • Names in R are case-sensitive. This means that Weights and weights are completely different things to R. Unfortunately, this is a common and frustrating error many of us make while using R.
  • It’s a good idea to have your names be as descriptive as possible, so that you will know what you meant later on when looking at it. (However, if they get too long, it becomes painful and error prone to type them each time we use them, so this, as with all things, requires moderation.) Underscores often become useful.

Get Familiar with basic coding using variables and vectors by watching these tutorials:

Reading Data

Importing Data Cheatsheet : Data Import Cheatsheet

  • A quick tutorial on importing data: Importing Data on R
  • Data is found in many different formats , it's helpful to know how to convert different data types. It's actually quite easy to convert data types to a preferred format.
How to convert R
  • Exporting Data : There are many different ways to export data in R. Here are a few you might use for data tables.
Exporting files
  • In this R stats video, learn to export using various R functions such as "write.table", "write.csv", "write.csv2", “write.command” and "write.delim" to export data out of R and save in various formats such as csv, tab-delimited, space-delimited

Data Transformation

Important packages: tidyr, dplyr

Data Cleaning with tidyr

Turn messy data into clean data! Available functions will allow you to deal with missing values, to nest data, separate or unite rows, pivot your data and more. Refer to the second page of the Data Import Cheatsheet for an overview of some of these useful tools.

  • Cleaning your data is important to make ensure it's easier to work with once you start analyzing and modelling your data.

Data Transformation with dplyr

The dplyr cheatsheet: dplyr Cheatsheet

  • dplyr provides a grammar for manipulating tables in R. This cheat sheet will guide you through the grammar, reminding you how to select, filter, arrange, mutate, summarise, group, and join data frames and tibbles.
  • dplyr allows you to perform excel like analysis, it provides the tools/functions to add or change columns, delete or subset variables and perform simple algebra!

Popular dplyr functions:

  • mutate(): adds new variables
  • recode(): recodes values in the variable
  • rowSums(): computes sum score
  • filter(): selects cases based on condition

Data Manipulation with data table Cheatsheet

Data Visualization

Data Visualization using ggplot2

Our Favourite R Youtuber MarinStats provides a handy tutorial for all sorts of graphs that will come in handy:

A helpful summary tool of graphs produced using practice R datasets:

Ggplot Commands

R will not automatically produce very nice looking plots, users will need to code to modify and customize graphs. Here are a few more helpful tutorials:

Explore your options with ggplot. This package becomes really important when it comes to presenting data.

ggplot graphs

Data Analysis

Descriptive statistics

Descriptive statistics provides a summary of your data, exploring descriptive statistics will help you:

  • Check whether data is loaded properly
  • Explore data to identify potential group differences, associations between variables.
  • Create sample descriptions by looking at percentages, means and standard variations.

Use the summary() function to get a quick overview of your data table!

A helpful guide to basic summary statistics: Calculating mean, standard deviation, frequencies and more in R

Descriptive statistics command summary

Basic Inferential Statistics:

T-test command summary
ANOVA command summary

Cheatsheets

Cheatsheets are pamphlet like utility documents for specific purposes. They contain shortcut instructions either for numerous functions from within a particular R package or for a certain category of useful functions. We encourage you to keep these cheatsheet pdf files handy.

Organizing your workspace and files

  • To minimize revision of the code within the team, while working on your project, it would be useful to save your files on the coursedrive
  • Have a folder with your project name on the course drive
  • Let the instruction for reading your .csv file be like the following


Extra Resources

Open Source R Guide Books:

References

  1. "Data Types and Structures". Retrieved November 7, 2019.