DATA STRUCTURES IN R
Pavan Kumar A
Senior Project Engineer
Big Data Analytics Team
CDAC-KP
DATA STRUCTURES IN R
Types of data structures in R
Vector : It is the structure that can contain one or more values of a single
type or mixed (characters, integers)
It is represented as one dimensional data
Matrices And Arrays : It is the 2-dimensional representation of data.
Data Frames: It is the rectangular 2-dimensional representation of data
DATA STRUCTURES IN R- INTEGER VECTORS
Following functions are used to create the character vectors
c() : Concatenate (joining items end to end)
seq() : Sequence (Generating equidistant series of numbers)
rep() : Replicate (used to generate repeated values)
c() examples
> c(42,57,12,39,1,3,4)
[1] 42 57 12 39 1 3 4
You can also concatenate vectors of more than one element
> x <- c(1, 2, 3)
> y <- c(10, 20)
DATA STRUCTURES IN R- INTEGER VECTORS
seq(): It is used to generate the series of numbers which is of equidistant
It accepts three arguments
Start element
Stop element
Jump element
> seq(4,9)#It generates the numbers from 4 to 9, only 2 arguments are given
[1] 4 5 6 7 8 9
> seq(4,10,2) #Three arguments are given, jump by 2 elements
[1] 4 6 8 10
DATA STRUCTURES IN R- INTEGER VECTORS
seq() vector creation is used in plotting the x and y axis in the graphical
analysis.
For example:
If x-axis co-ordinates are being created as
c(1.65,1.70,1.75,1.80,1.85,1.90)
Then simply using following command, can create the same
Syntax :
seq(from , to, by)
Seq (1.65,1.90,0.05)
> 4:9
#exactly the same as seq(4,9)
[1] 4 5 6 7 8 9
> sum(1:10)
[1] 55
DATA STRUCTURES IN R- INTEGER VECTORS
Another Example of seq() command, Here we are adding length.out
argument for the seq() command
from = Starting Element
to = Ending Element
by = ((to - from)/(length.out - 1))
DATA STRUCTURES IN R- INTEGER VECTORS
rep(), is used to generate repeated values.
It is used in two variants, depending on whether the second argument is a
vector or a single number
> oops <- c(7,9,13)
> rep(oops,3) # It repeats the entire vector oops 3 times
[1] 7 9 13 7 9 13 7 9 13
> rep(oops,1:3)
[1] 7 9 9 13 13 13
Here, oops should be repeated by vector of 1:3 values.
Indicating that 7 should be repeated once, 9 twice, and 13 three times
DATA STRUCTURES IN R- INTEGER VECTORS
Look at following examples
> rep(oops,1:4)
Error in rep(anow, 1:4) : invalid 'times' argument
> rep(1:2,c(10,15))
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> rep(1:2,each=10)
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
> rep(1:2,c(10,10)
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
DATA STRUCTURES IN R- INTEGER VECTORS
Integer vectors : Indexing
DATA STRUCTURES IN R- CHARACTER VECTORS
Character Vector: A character vector is a vector of text strings, whose elements
are specified and printed in quotes
> c("Huey","Dewey","Louie")
[1] "Huey" "Dewey" "Louie
Single quotes or Double quotes can be used for strings
> c(Huey,Dewey,Louie)
[1] "Huey" "Dewey" "Louie
"Huey", it is a string of four characters, not six.
The quotes are not actually part of the string, they are just there so that the
system can tell the difference between a string and a variable name.
DATA STRUCTURES IN R- CHARACTER VECTORS
If you print a character vector, it usually comes out with quotes added to each
element. There is a way to avoid this, namely to use the cat() function.
For instance,
> cat(c("Huey","Dewey","Louie"))
Huey Dewey Louie>
> represents no new line character
To avoid this, and to get the system prompt next line:
> cat("Huey","Dewey","Louie", "\n")
Huey Dewey Louie
>
DATA STRUCTURES
IN
R- CHARACTER VECTORS
Quoting and escape sequences
If the strings itself contains some quotations, new line characters.
This is done using escape sequences
Here, \n is an example of an escape sequence.
The backslash (\) is known as the escape character
If you want to insert quotes with in the string, the \ is used. For example
> cat("What is \"R\"?\n")
What is "R"?
DATA STRUCTURES IN R- CHARACTER VECTORS
Logical vectors can take the value TRUE or FALSE
In input, you may use the convenient abbreviations T and F
> c(T,T,F,T)
[1] TRUE TRUE FALSE TRUE
DATA STRUCTURES IN R- CHARACTER VECTORS
Example of Character Vector: Indexing
DATA STRUCTURES IN R- CHARACTER VECTORS
Missing values
In many data sets, you may find missing values.
We need to have some method to deal with the missing values
R allows vectors to contain a special NA value.
Result of computations done on NA will be NA
DATA STRUCTURES IN R- COMBINATION OF INT AND CHAR
Example of c()
It is also possible to assign names to the elements
DATA STRUCTURES IN R- MATRICES AND ARRAYS
Matrix: It is 2 dimensional representation of numbers.
Matrices and arrays are represented as vectors with dimensions
> x <- 1:12
> dim(x) <- c(3,4) #The dim assignment function sets or changes the
dimension attribute of x, causing R to treat the vector of 12 numbers as a 3 4
matrix
DATA STRUCTURES IN R- MATRICES AND ARRAYS
Another way to create Matrix is simply by
using matrix() function
Syntax
matrix(data = NA, nrow = 1, ncol = 1,
byrow = FALSE)
DATA STRUCTURES IN R- MATRICES AND ARRAYS
You can glue vectors together, columnwise or rowwise, using the cbind and
rbind functions.
The cbind() : Column bind
The rbind() : Row bind
Arrays are similar to matrices but can have more than two dimensions.
See help(array) for details
DATA STRUCTURES IN R- MATRICES
Matrix Operations
We can extract the desired rows from the matrix
created as shown
Functions like rowSums() and rowMeans() are
used to calculate the sum of all row elements and
mean of all row elements respectively
Functions like colSums() and colMeans() are
used to calculate the mean of all column elements
and mean of all column elements respectively
DATA STRUCTURES IN R-DATA FRAMES
Data Frame is also 2-dimensional object just like Matrix, for storing data
tables.
Here, different columns can have different modes (numeric, character, factor,
etc).
All data frames are rectangular and R will remove out any short using NA
Creating Data Frame
DATA STRUCTURES IN R-DATA FRAMES
Error: Here, in the second vector e , is a 3 element vector and d and f are 4
element vectors.
It is a collection of vectors (Integer/Character) of equal lengths
Each column in the Data Frame can be a separate type of data. In the previous
example mydata data frame, it is the combination of numerical, character and
factor data types.
ACCESSING DATA FRAMES
There are a variety of ways to identify the elements of a data frame. Here are
few screenshots.
BUILD-IN DATA FRAMES
IN
R has some build-in dataframes. mtcars is one data frame
DATA STRUCTURES IN R-LISTS
Lists: It is the collection of objects that fall under similar category.
A list is not fixed in length and can contain other lists.
CREATING DATA SUBSETS
R deals with huge data, not all of which is useful.
Therefore, first step is to sort out the data containing the relevant information.
Extracted data sets are further divided into small subsets of data.
Function used for extracting the data is subset().
The following operations are used for subset the data.
$ (Dollar) : Used to select the single element of the data.
[] (Single Square Brackets) : Used to extract multiple elements of data.
CREATING DATA SUBSETS
We can extract (subset) the part of the data table based on some condition
using subset() function
Syntax
Example
subset(x(dataset), function)
writer_names_df <- subset(writers_df, Age.At.Death <= 40 & Age.As.Writer >= 18)
writer_names_df <- subset(writers_df, Name =="Jane")
male_writers <- writers_df[Gender =="MALE",]
writers_df[1,3] <- NULL #making null value
## Age.At.Death Age.As.Writer Name Surname Gender Death
## 1 22 16 Jane Doe FEMALE 2015-05-10
## 4 41 36 Jane Austen FEMALE 1817-07-18
CREATING SUBSETS
IN
VECTORS
To create subsets in vectors, subset() or [] can be used
## A simple vector
v<-c(1,5,6,4,2,4,2)
#Using subset function
subset(v,v<4)
#Using square brackets
v[v<4]
Creates the subset of numbers greater than 4 using
subset() function
Creates the subset of numbers greater than 4 using []
brackets
#Another vector
t<-c(one, one, two, three, four, two)
# Remove one entries
subset(t, t!=one)
t[t!=one]
Creates the subset of texts after removing the word, one
using subset() function
Creates the subset of texts after removing the word, one
using [] function
CREATING SUBSETS
IN
VECTORS
Execution of code on R console
CREATING SUBSETS IN DATA FRAMES
Data Frames subsets can also be done using subset() and [] function
CREATING SUBSETS IN DATA FRAMES
Data Frames subsets can also be done using subset() and [] function
THANK YOU !!!