LFS:Seminars/R
The R Project for Statistical Computing
R Project Web Site
The Comprehensive R Archive Network.
R Studio Web Site
Video Resources
Online resource available on YouTube by Mike Marin.
https://www.youtube.com/user/marinstatlectures?feature=g-high-rec
Video series by the Google engineers.
http://www.youtube.com/playlist?list=PLOU2XLYxmsIK9qQfztXeybpHvru-TrqAP
What is R
R is a programming language and software environment for statistical computing and graphics. Windows, MAC, and Linux platform.
Load packages
Menu -> Packages -> Install Packages Menu -> Packages -> Load Packages
Examples
>example(lm) >demo(persp) >demo(graphics) >demo(Hershery) >demo(plotmath)
Data in
>x <- 1:12 >dim(x) <- c(3,4) >x >Matrix(1:12, nrow=3, byrow=T) >t(x) >dd <- data.frame() >fix(dd)
Edit a table
>aq <- edit(airquality) >fix(airquality)
Read from a text file
>x <- read.table(“c:\\folder\\filename.txt”, header=T) Note: header=T, the first row in the data file is the variable name
Read from Excel files
Install xlsx package, >require(xlsx) >mydata <- read.xlsx(“h:\\aa.xls”) >mydata >fix(mydata)
Get help
From menu Menu -> help >?read.table >help.search(“data input”) >find(“lowess”) >apropos(“lm”)
Statistical Operations
Simple statistic
>x <- rnorm(50) >x >mean(x) >sd(x) >var(x) >median(x) >quantile(x) >hist(x)
Normal distribution
>x <- seq(-4,4,0.1) >plot(x,dnorm(x),type=‘l’) >curve(dnorm(x),from=-4, to=4)
Binomial distribution
>x <- 0:50 >plot(x,dbinom(x,size=50,prob=0.33),type=‘h’)
Example
>weight <- c(60,72,57,90,95,72) >height <- c(1.75,1.80,1.65,1.90,1.74,1.91) >bmi <- weight/height^2 >sum(weight) >sum(weight)/length(weight) >mean(weight) >xbar <- sum(weight)/length(weight) >sqrt(sum((weight-xbar)^2)/(length(weight)-1)) >sd(weight)
Formular of SD
SD = √(Σ(Xi –X)^2/(n-1))
T test
>t.test(bmi,mu=22.5)
Calculate t-value
>qt(0.95, df=5)
Plotting
>plot(height, weight) >plot(height, weight, pch=2) >Lines(height, 22.5*height^2)
Regression
Please Install and Load package ISwR >attach(thuesen) >y <- short.velocity >x <- blood.glucose >lm(y~x) >summary(lm(y~x))
Short.velocity=1.098 + 0.022 * blood.glucose
Regression
>beta<-sum((x-mean(x))*(y-mean(y)))/sum((x-mean(x))^2) >alfa <- mean(y)-beta*mean(x) >SSX<-sum(x^2)-(sum(x))^2/23 >SSY<-sum(y^2)-(sum(y))^2/23 >SSXY<-sum(x*y)-(sum(y)*sum(x))/23 >SSR<-SSXY^2/SSX >SSE<- SSY-SSR
>s2<-SSE/21 #s2 error variance >seb <- sqrt(s2/SSX) #seb Standard Error of slope >sea <- sqrt(s2*sum(x^2)/23/SSX) #sea Standard Error of the intercept
>summary(aov(lm(y~x)))
Regression(more example) and ANOVA
# Grouth ~ tannin
> reg <- read.table("s:\\temp\\tannin.txt",header=T) > attach(reg) > names(reg)
> plot(tannin,grouth,pch=16) > lm(grouth~tannin)
> plot(tannin,grouth,pch=16) > abline(lm(grouth~tannin)) > SSX<-sum(tannin^2)-sum(tannin)^2/length(tannin) > SSX > SSY<-sum(grouth^2)-sum(grouth)^2/length(grouth) > SSY > SSXY <- sum(tannin*grouth)-sum(tannin)*sum(grouth)/length(tannin) > SSXY
> b <- SSXY/SSX > b > a<-sum(grouth)/length(grouth)-b*sum(tannin)/length(tannin) > a > SSR <- b*SSXY > SSR > SSE <- SSY-SSR > SSE > model<-lm(grouth~tannin) > summary.aov(model) # Anova > summary(model) #T test > plot(model)
Correlation(example)
>cor(1:10,2:11) >cor(1:10,11:2)
Correlation Matrix of Multivariable
>longley >Ci <- cor(longley)
Graphical display
>Symnum(Ci)
Correlation
>attach(thuesen) >thuesen >cor(blood.glucose,short.velocity) >cor(blood.glucose,short.velocity,use="complete.obs") >cor.test(blood.glucose,short.velocity,use="complete.obs")
Google Visualization
googleVis is an R package providing an interface between R and the Google Chart Tools to visualise data. The output of googleVis functions is html code that contains the data and references to JavaScript. To view the output a browser with Flash and Internet connection is required.
Examples:
http://code.google.com/p/google-motion-charts-with-r/ http://wmc.landfood.ubc.ca/webapp/VWM/course/global-food-challenges/introduction/
Principle Component Analysis
*reduce dimantions *create a new coordinate system for the data, the principal components being the unit vectors along the axes. The first principal component gives the direction of the maximum spread of the data. The second gives the direction of maximum spread perpendicular to the first direction.
Example 1
Matrix: 100 X 2 data file : marks.txt >dat = read.table("marks.txt".head=T) >dim(dat) >names(dat) >pc1 = princomp(~Stat + Phys, dat) >pc1$loadings >pc1 >names(pc1) >pc1$scores
Example 2
Matrix 46420 X 23 data file: quasar.txt >quas = read.table("quasar.txt", head=T) >dim(quas) >names(quas) >pc2 = princomp(quas[,-1], scores=T) >pc2 >plot(pc2) >screeplot(pc2) >screeplot(pc2, type="lines") >pc2$loadings[,1:2]