Documentation:Manipulating Data in R

From UBC Wiki

Once you have extracted data from a file in r, you may be interested in manipulating the data in certain ways. This could involve adding headers to the tables, combining tables, removing columns from the tables, and handling missing data in a data set

Adding a Header to a table

You can add headers to a column by using the command
>names(mydata) = c("name for column 1", "name for column 2", "name for column 3")
This will add a header to each individual column of your choosing. Alternatively, you can use the command
>colnames(mydata) = c("name for column 1", "name for column 2", "name for column 3")
This has the same functionality as names(mydata), however it is quicker as R doesn't need to reload the entire data set.
An example of this is shown below.

>mydata= c(1, 2, 3, 4, 5)
>mydata
[1] 1 2 3 4 5
>names(mydata) = c("A", "B", "C", "D", "E")
>mydata

A B C D E
1 2 3 4 5

Combining Multiple Tables

You can combine multiple tables either vertically or horizontally, example of each and the code needed is shown below.
>x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) ## Make a list of values
>y = c(11, 12, 13, 14, 15, 16, 17, 18, 19, 20) ## Make a second list of values
horizontally merging these two list will use the command
>horizontal = rbind(x, y) ##combine the two tables
>horizontal ## return the newly created table

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
x 1 2 3 4 5 6 7 8 9 10
y 11 12 13 14 15 16 17 18 19 20

vertically merging the tables will use the merge command
>x=c(1, 2, 3)
>y=c(4, 5, 6)
>vertical = merge(x, y)
>vertical

x y
1 1 4
2 2 4
3 3 4
4 1 5
5 2 5
6 3 5
7 1 6
8 2 6
9 3 6


You can also add tables together using simple commands
>x=c(1, 2, 3)
>y=c(4, 5, 6)
>add = x+y
>add
[1] 5 7 9
>subtract= x-y
>subtract
[1] -3 -3 -3
>multiply = x*y
>multiply
[1] 4 10 18

Removing Columns

There are several ways to remove columns from a set of data in R, one way is shown below
>mymatrix= matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3) ##create mock data
>mymatrix
##Display the data I have created

[,1] [,2] [,3]
[1,] 1 4 6
[2,] 2 5 8
[3,] 3 6 9

To remove a column from this data we can use the command
>removecolumn2 = mydata[-2] >removecolumn2

[,1] [,2]
[1,] 1 6
[2,] 2 8
[3,] 3 9

Handling Missing values on R

There are many commands in R that will aid with handling missing data. A few of the most simple commands are shown below.
>missinginputs = matrix(NA, nrow=3, ncol=3)
>missinginputs

[1] [2] [3]
[1] NA NA NA
[2] NA NA NA
[3] NA NA NA

>Fullarray = matrix((1, 2, 3, 4, 5, 6, 7, 8, 9),ncol=3,nrow=3)
>Fullarray

[,1] [,2] [,3]
[1,] 1 4 6
[2,] 2 5 8
[3,] 3 6 9

>partialarray=matrix(c(1, 2, 3, NA, NA, 6, 7, 8, 9), nrow=3, ncol=3)
>partialarray

[,1] [,2] [,3]
[1,] 1 NA 6
[2,] 2 NA 8
[3,] 3 6 9

The command na.fail(object) will return the inputted object if the object does not contain any NA's. If the object does contain Na's it will return an error. The command na.omit(object) and na.exclude(object) will both return the object with any row's containing NA's omitted and deleted from the object. na.pass(object) will return the object.
an example of each is shown below
>na.fail(missinginputs)
Error in na.fail.default(missinginputs) : missing values in object
>na.fail(Fullarray)

[,1] [,2] [,3]
[1,] 1 4 6
[2,] 2 5 8
[3,] 3 6 9

>na.pass(Fullarray)

[,1] [,2] [,3]
[1,] 1 4 6
[2,] 2 5 8
[3,] 3 6 9

>na.omit(partialarray)## Could have also used na.exclude

[3,] 3 6 9

Replace values inside a table

To change the value of a certain number inside a table, simply select the value in the table and set it equal to the value you wish to change it to. An example is shown below. >x = matrix( c(2, 4, 3, 1, 5, 7), nrow=3, ncol=2) >x

2 4
3 1
5 7

>x[1, 1] = 6 >x

6 4
3 1
5 7

Extracting data from a table And using for loops

There are many way to extract data from a table. If you wish to extract a single number from the table, and you know the position of the number. You can called the index of the matrix to extract the number. An example is shown below. >x = matrix( c(2, 4, 3, 1, 5, 7), nrow=3, ncol=2) >x

[,2] 4
[,3] 1
[,5] 7

>x[1, 1] 2

The Syntax for extracting a row from r is
>x = matrix( c(2, 4, 3, 1, 5, 7), nrow=3, ncol=2)
>x[1, ]##To extract the first row

Similarly for extracting a column
>x = matrix( c(2, 4, 3, 1, 5, 7), nrow=3, ncol=2)
>x[ ,1]##To extract the first column

Another task you may be interested in doing is searching through a matrix and extracting information about the elements of the matrix. For example, You may be interested in counting the number of zeros contained in the matrix. You may also be interested in replacing all the ones in the matrix with a different value. To do this we can use for loops. For loops are a powerful and versatile tool that can be used to look through large amounts of data. Two examples are shown below.
>Matrix = matrix( c(0, 1,0,1,0,1,0,1,0,1,0,1,0,1,0,1), nrow=4, ncol=4) ##Create a matrix containing 16 alternating values of 0 and one.
>Matrix

[,1] [,2] [,3] [,4]
[1] 0 0 0 0
[2] 1 1 1 1
[3] 0 0 0 0
[3] 1 1 1 1

>#Create a for loop that replaces all zeros with a 5
for (i in 1:dim(Matrix)[1]){ for (j in 1:dim(Matrix)[2]){ if (Matrix[i, j] == 0){ Matrix[i, j] = 5 } } }
>Matrix

[,1] [,2] [,3] [,4]
[1] 5 5 5 5
[2] 1 1 1 1
[3] 5 5 5 5
[3] 1 1 1 1


The command dim(Matrix) returns the dimensions of the Matrix. By indexing it's return, you are able to return the first and second dimensions of the matrix. We use these values as the bounds for our for loop.
Another example is shown below. This example shows how we can use a for loop to count how many times an element occurs inside a matrix. >count=0

 for (i in 1:dim(Matrix)[1]){
 for (j in 1:dim(Matrix)[2]){
 if (Matrix[i, j] == 5){
 count=count+1
 }  
 }
 }
 print(count)
 8

Vectors and Mathematics

In R, mathematics will be performed member by member, or memberwise. For example, if we have two vectors A and B.
>A=c(1, 2, 3, 4, 5)
>B=c(6, 7, 8, 9, 10)
And we wish to add these vectors together.
>A+B
[1] 7 8 9 10 11

If we wish to perform other mathematical operations on vectors, the effect is very similiar
>A=c(2, 4, 6, 8, 10)
>B=c(1, 2, 3, 4, 5)
>A/B
[1] 2 2 2 2 2

If two vectors are not the same length, R uses the "Recycling rule". This rule will reuse the components of a vector in order to make the vectors equal length.
>A=c(1, 2, 3, 4, 5)
>B=c(1, 2, 3)
>A+B
[1] 2 4 6 5 7
Note that 1, 2, 3 were added together. but the 1 and the 2 in the vector B were recycled and added to the 4 and the 5 respectively.

>A*5
[1] 5 10 15 20 25

If instead you want to combine two vectors, you can combine them as if they were components of a vector(ie. using the combine command).
> A=c(1, 2, 3, 4)
> B=c(5, 6, 7, 8)
> C=c(A,B)
> C
[1] 5 10 15 20 25 30 35 40

We can perform other mathematical commands involving vectors. These commands include cross products, dot products
> A=c(1, 2, 3, 4)
> B=c(1, 2, 3, 4)
>A%*%B

[1]
[1] 30

The order of the vectors will determine whether you are doing a cross product or an inner product