Unit 5
Unit 5
R Programming
What is R
• R is a popular programming language used for statistical computing
and graphical presentation.
• Its most common use is to analyze and visualize data
Difference between R & Python
S.No
Parameters R Python
Objective Its main objective is to perform data analysis Python is used for deployment and
1.
primary users of R.
Flexibility In R, we can easily use the available libraries. In Python, we can easily construct new
3.
beginning.
Popularity of Less popular compare to python More popular compare to R
5.
programming
language
Integration R runs locally. It is well-integrated with the app.
7.
Database size R handles the huge size of data. It will also handle the huge size of data.
9.
Packages and library tydiverse, ggplot2, caret, and zoo. Pandas, scipy, scikit-learn, TensorFlow,
11.
caret.
Advantages 1. Beautiful graph construction. 1. Notebooks help to share data with
12.
if, else, repeat, while, function, for, in, next and break are used for control-flow statements and
declaring user-defined functions.
The ones left are used as constants like TRUE/FALSE are used as boolean constants.
NaN defines Not a Number value and NULL are used to define an Undefined value.
Inf is used for Infinity values.
Basic Data Types
• Each variable in R has an associated data type.
• Each R-Data Type requires different amounts of memory and has
some specific operations which can be performed over it.
• Basic data types in R can be divided into the following types:
•numeric - (10.5, 55, 787)
•integer - (1L, 55L, 100L, where the letter "L" declares
this as an integer)
•complex - (9 + 3i, where "i" is the imaginary part)
•character (a.k.a. string) - ("k", "R is exciting", "FALSE",
"11.5")
•logical (a.k.a. boolean) - (TRUE or FALSE)
We can use the class() function to check the data type of a variable:
Creating a Function & Call a Function
To create a function, use
the function() keyword:
my_function <- function() { # create a
function with the name my_function
print("Hello World!")
}
Output:
a <- "Hello“ [1] "Hello How are you? “
[1] "Hello-How-are you? “
while ( condition )
{
statement
}
Program to display numbers from 1 to 5 using a while
loop in R.
# R program to demonstrate the use of while loop
val = 1
# using while loop
while (val <= 5)
{
print(val)
val = val + 1
}
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Repeat Loop in R
It is a simple loop that will run the same statement or a group of statements
repeatedly until the stop condition has been encountered. Repeat loop does not
have any condition to terminate the loop, a programmer must specifically place a
condition within the loop’s body and use the declaration of a break statement to
terminate this loop. If no condition is present in the body of the repeat loop then
it will iterate infinitely.
R – Repeat loop Syntax:
Repeat
{
statement
if( condition )
{
break
}
}
Example 1: Program to display numbers from 1
to 5 using a repeat loop in R.
val = 1
repeat
{
print(val)
val = val + 1
if(val > 5)
{
break
}
}
Output:
[1] 1
[1] 2
[1] 3
[1] 4
Jump Statements in Loop
Break Statement: The break keyword is a jump statement that is used to terminate the loop at a particular
iteration.
Next Statement: The next keyword is a jump statement which is used to skip a particular iteration in the loop.
Example of next statement:
for (val in 1: 5)
{
if (val == 3)
{
next
}
print(val)
}
Output:
[1] 1
[1] 2
[1] 4
[1] 5
R Vectors
R Vectors are the same as the arrays in R language which are used to
hold multiple data values of the same type.
One major key point is that in R Programming Language the indexing
of the vector will start from ‘1’ and not from ‘0’.
We can create numeric vectors and character vectors as well.
Types of R vectors
• Numeric vectors: Numeric vectors are those which contain numeric values such as integer,
float, etc.
• Example:
• v1<- c(4, 5, 6, 7)
• typeof(v1) #"double“
• v2<- c(1L, 4L, 2L, 5L)
• typeof(v2) #"integer"
• Character vectors: Character vectors in R contain alphanumeric values and special
characters.
• Example:
• v1<- c('geeks', '2', 'hello', 57)
• typeof(v1) #"character"
• Logical vectors: Logical vectors in R contain Boolean values such as TRUE, FALSE and NA for
Null values.
• Example:
• v1<- c(TRUE, FALSE, TRUE, NA)
List creation & Accessing
List: A list in R can contain many different data types inside it.
A list is a collection of data which is ordered and changeable.
To create a list, use the list() function:
# List of strings
thislist <- list("apple", "banana", "cherry") #create a list
thislist# Print the list
thislist[1] # Access Lists
thislist[1] <- "blackcurrant” #Change Item Value
length(thislist) #List Length
" cherry " %in% thislist #Check if Item Exists use the %in% operator
append(thislist, "orange") # Add List Items
newlist <- thislist[-1] # Remove List Items
Join Two Lists
list1 <- list("a", "b", "c")
list2 <- list(1,2,3)
list3 <- c(list1,list2) #The most common way is to use the c() function, which combines two elements together
list3
Matrices
A matrix is a two dimensional data set with columns and rows.
A column is a vertical representation of data, while a row is a horizontal representation of data.
A matrix can be created with the matrix() function. Specify the nrow and ncol parameters to get the amount of rows
and columns:
Example:
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)
print(thismatrix)
thismatrix[1, 2]#Access Matrix Items
thismatrix[2,]#Access Matrix Items
thismatrix[,2]#Access Matrix Items
thismatrix[c(1,2),]#Access More Than One Row
thismatrix[, c(1,2)]#Access More Than One Column
newmatrix <- rbind(thismatrix, c("strawberry", "blueberry", "raspberry"))
#rbind() function to add additional rows in a Matrix
thismatrix <- thismatrix[-c(1), -c(1)] #Remove the first row and the first column
"apple" %in% thismatrix #Check if an Item Exists
dim(thismatrix)#Number of Rows and Columns
length(thismatrix) #Matrix Length
R Factors
• Factors are used to categorize data.
• Examples of factors are:
• Demography: Male/Female
• Music: Rock, Pop, Classic, Jazz
• Training: Strength, Stamina
• To create a factor, use the factor() function and add a vector as argument:
• Example:
# Create a factor
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock",
"Jazz"))
print(music_genre) # Print the factor
Output:
[1] Jazz Rock Classic Classic Pop Jazz Rock Jazz
Levels: Classic Jazz Pop Rock
Example 2:
music_genre <-factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))
music_genre #print
levels(music_genre)#only print the levels, use the levels() function
length(music_genre) #Factor Length
music_genre[4] #Access Factors
music_genre[3] <- "Pop“ #Change Item Value
music_genre[3]
Output:
[1] Jazz Rock Classic Classic Pop Jazz Rock Jazz
Levels: Classic Jazz Pop Rock
[1] "Classic" "Jazz" "Pop" "Rock"
[1] 8
[1] Classic
Levels: Classic Jazz Pop Rock
[1] Pop
Levels: Classic Jazz Pop Rock
Arrays
• Arrays are essential data storage structures defined by a fixed number of dimensions. Arrays
are used for the allocation of space at contiguous memory locations.
• In R Programming Language Uni-dimensional arrays are called vectors with the length being
their only dimension.
• Two-dimensional arrays are called matrices, consisting of fixed numbers of rows and columns.
• R Arrays consist of all elements of the same data type.
• Vectors are supplied as input to the function and then create an array based on the number of
dimensions.
• An R array can be created with the use of array() the function. A list of
elements is passed to the array() functions along with the dimensions as
required.
• Syntax:
array(data, dim = (nrow, ncol, nmat), dimnames=names)
where
• nmat: Number of matrices of dimensions nrow * ncol
• dimnames : Default value = NULL.
Data Frames & Structure
• Data Frames in R Language are generic data objects of R that are
used to store tabular data.
• Data frames can also be interpreted as matrices where each column
of a matrix can be of different data types.
• R DataFrame is made up of three principal components, the data,
rows, and columns.
• The data is presented in tabular form, which makes it easier to
operate and understand
Create Dataframe in R
Programming Language
• To create an R data frame use data.frame() function and then pass each of the vectors
you have created as arguments to the function.
# R program to create dataframe
# creating a data frame
friend.data <- data.frame(
friend_id = c(1:5),
friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
)
# print the data frame
print(friend.data)
Output:
'data.frame': 5 obs. of 2 variables:
$ friend_id : int 1 2 3 4 5
$ friend_name: chr "Sachin" "Sourav" "Dravid" "Sehwag" ...
R – Charts and Graphs
R – Charts and Graphs
R language is mostly used for statistics and data analytics purposes to
represent the data graphically in the software. To represent those data
graphically, charts and graphs are used in R.
• R – graphs: There are hundreds of charts and graphs present in R. For
example, bar plot, box plot, mosaic plot, dot chart, coplot, histogram,
pie chart, scatter graph, etc.
• Types of R – Charts
1. Bar Plot or Bar Chart
2. Pie Diagram or Pie Chart
3. Histogram
4. Scatter Plot
5. Box Plot
6. Line Graphs
Bar Plot or Bar Chart
Bar plot or Bar Chart in R is used to represent the values in data vector
as height of the bars. The data vector passed to the function is
represented over y-axis of the graph.
Bar chart can behave like histogram by using table() function instead of
data vector.
• Syntax: barplot(data, xlab, ylab)
• where:
• data is the data vector to be represented on y-axis
• xlab is the label given to x-axis
• ylab is the label given to y-axis
Note: To know about more optional parameters in barplot() function,
We use help(barplot) command in R console
# defining vector
x <- c(7, 15, 23, 12, 44, 56, 32)
# plotting vector
barplot(x, xlab = " Audience",
ylab = "Count", col = "white",
col.axis = "darkgreen",
col.lab = "darkgreen")
If a scatter plot has to be drawn to show the relation between 2 or more vectors or to plot the scatter plot
matrix between the vectors, then pairs() function is used to satisfy the criteria.
• yntax: pairs(~formula, data)
• where,
• ~formula is the mathematical formula such as ~a+b+c
• data is the dataset form where data is taken in formula.
# taking input from dataset Orange already
# present in R
orange <- Orange[, c('age', 'circumference')]
# plotting
plot(x = orange$age, y = orange$circumference, xlab = "Age",
ylab = "Circumference", main = "Age VS Circumference",
col.lab = "darkgreen", col.main = "darkgreen",
col.axis = "darkgreen")
# plotting
boxplot(x, xlab = "Box Plot", ylab = "Age",
col.axis = "darkgreen", col.lab = "darkgreen")