R Language Lab Manual Lab 1
R Language Lab Manual Lab 1
R is a programming language and free software developed by Ross Ihaka and Robert Gentleman in
1993. R possesses an extensive catalog of statistical and graphical methods. It includes machine
learning algorithms, linear regression, time series, statistical inference to name a few. Most of the R
libraries are written in R, but for heavy computational tasks, C, C++ and Fortran codes are
preferred. R is not only entrusted by academic, but many large companies also use R programming
language, including Uber, Google, Airbnb, Facebook and so on.
Discover: Investigate the data, refine your hypothesis and analyze them
Model: R provides a wide array of tools to capture the right model for your data
Communicate: Integrate codes, graphs, and outputs to a report with R Markdown or build Shiny
apps to share with the world
Statistical inference
Data analysis
Machine learning algorithm
Enter/browse the path to the installation folder and click Next to proceed.
Select the folder for the start menu shortcut or click on do not create shortcuts and then click
Next.
Installing Packages:-
The most common place to get packages from is CRAN. To install packages from CRAN you
use install.packages("package name"). For instance, if you want to install the ggplot2
package, which is a very popular visualization package, you would type the following in the
console:-
Syntax:-
# install package from CRAN
install.packages("ggplot2")
Loading Packages:-
Once the package is downloaded to your computer you can access the functions and
resources provided by the package in two different ways:
# load the package to use in the current R session
library(packagename)
Assignment Operators:-
The first operator you’ll run into is the assignment operator. The assignment operator is used
to assign a value. For instance we can assign the value 3 to the variable x using the <-
assignment operator.
# assignment
x <- 3
Interestingly, R actually allows for five assignment operators:
# leftward assignment
x <- value
x = value
x <<- value
#
rightwardassignment
value -> x
value ->> x
The original assignment operator in R was <- and has continued to be the preferred among R
users. The = assignment operator was added in 2001 primarily because it is the accepted
assignment operator in many other languages and beginners to R coming from other
languages were so prone to use it.
The operators <<- is normally only used in functions which we will not get into the details.
Evaluation
We can then evaluate the variable by simply typing x at the command line which will return
the value of x. Note that prior to the value returned you’ll see ## [1] in the command line.
This simply implies that the output returned is the first output. Note that you can type any
comments in your code by preceding the comment with the hash tag (#) symbol. Any values,
symbols, and texts following # will not be evaluated.
# evaluation
x
## [1] 3
Case Sensitivity
Lastly, note that R is a case sensitive programming language. Meaning all variables,
functions, and objects must be called by their exact spelling:
x <- 1
y <- 3
z <- 4
x*y*z
## [1] 12
x*Y*z
## Error in eval(expr, envir, enclos): object 'Y' not found
Basic Arithmetic
At its most basic function R can be used as a calculator. When applying basic arithmetic, the
PEMDAS order of operations applies: parentheses first followed by exponentiation,
multiplication and division, and final addition and subtraction.
8+9/5^2
## [1] 8.36
8 + 9 / (5 ^ 2)
## [1] 8.36
8 + (9 / 5) ^ 2
## [1] 11.24
(8 + 9) / 5 ^ 2
## [1] 0.68
By default R will display seven digits but this can be changed using options() as previously
outlined.
1/7
## [1] 0.1428571
options(digits = 3)
1/7
## [1] 0.143
pi
## [1] 3.141592654
options(digits = 22)
pi
## [1] 3.141592653589793115998
We can also perform integer divide (%/%) and modulo (%%) functions. The integer divide
function will give the integer part of a fraction while the modulo will provide the remainder.
42 / 4 # regular division
## [1] 10.5
42 %/% 4 # integer division
## [1] 10
42 %% 4 # modulo (remainder)
## [1] 2
Before we get rolling with the EDA, we want to download our data set. For this
example, we are going to use the dataset produced by this recent science,
technology, art and math (STEAM) project.
Now that we have the data set all loaded, and it’s time to run some very simple
commands to preview the data set and it’s structure.
Head:-
To begin, we are going to run the head function, which allows us to see the first 6 rows by default.
We are going to override the default and ask to preview the first 10 rows.
>head(df, 10)
Tail:-Tail function allows us to see the last n observations from a given data frame. The defult
value for n is 6. User can specify value of n as per as requirements.
>tail(mtcars,n=5)
Next, we will run the dim function which displays the dimensions of the table. The output
takes theform of row, column.And then we run the glimpse function from the dplyr
package. This will display a vertical previewof the dataset. It allows us to easily preview
data type and sample data.
dim(df)
#Displays the type and a preview of all columns as a row so that it's very easy to
take in. library(dplyr)
glimpse(df)
In contrast to other programming languages like C and java in R, the variables are not declared as
some data type. The variables are assigned with R-Objects and the data type of the R-object
becomes the data type of the variable. There are many types of R-objects. The frequently used
ones are –
R Objects:-
Vectors
Lists
Matrices
Arrays
Factors
Data Frames
Vectors:-
R programming, the very basic data types are the R-objects called vectors which hold elements
of different classes as shown above. Please note in R the number of classes is not confined to
only the above six types. For example, we can use many atomic vectors and create an array
whose class will become array.
When you want to create vector with more than one element, you should use c() function which
means to combine the elements into a vector.
# Create a vector.
apple <- c('red','green',"yellow")
print(apple)
You can use subscripts to select the specific component of the list.
> x <- list(1:3, TRUE, "Hello", list(1:2, 5))
Here x has 4 elements: a numeric vector, a logical, a string and another list.
> x[[3]]
[1] "Hello"
> x[c(1,3)]
[[1]]
[1] 1 2 3
[[2]]
[1] "Hello"
We can also name some or all of the entries in our list, by supplying argument names to list().
Matrices:-
Matrices are much used in statistics, and so play an important role in R. To create a matrix
use the function matrix(), specifying elements by column first:
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
This is called column-major order. Of course, we need only give one of the dimensions:
[1,] 1 1 1 1
[2,] 2 2 2 2
[3,] 3 3 3 3
> diag(3)
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
> diag(1:3)
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3
[1,] 1 2 3 4 5
[2,] 2 4 6 8 10
[3,] 3 6 9 12 15
[4,] 4 8 12 16 20
[5,] 5 10 15 20 25
The last operator performs an outer product, so it creates a matrix with (i, j)-th entry xiyj .
The function outer() generalizes this to any function f on two arguments, to create a matrix
with entries f(xi , yj ). (More on functions later.)
[1,] 2 3 4 5
[2,] 3 4 5 6
[3,] 4 5 6 7
Array:
If we have a data set consisting of more than two pieces of categorical informationabout each
subject, then a matrix is not sufficient. The generalization of matrices to higher
dimensions is the array. Arrays are defined much like matrices, with a call to the array()
command. Here is a 2 × 3 × 3 array:
> arr
,,1
[1,] 1 3 5
[2,] 2 4 6
,,2
[1,] 7 9 11
[2,] 8 10 12
,,3
[1,] 13 15 17
[2,] 14 16 18
Each 2-dimensional slice defined by the last co-ordinate of the array is shown as a 2 × 3
matrix. Note that we no longer specify the number of rows and columns separately, but use a
single vector dim whose length is the number of dimensions. You can recover this vector
with the dim() function.
> dim(arr)
[1] 2 3 3
> arr[1,2,3]
[1] 15
> arr[,2,]
[,1] [,2] [,3]
[1,] 3 9 15
[2,] 4 10 16
> arr[,,1,drop=FALSE]
,,1
[1,] 0 3 5
[2,] 2 4 6
Factors:-
R has a special data structure to store categorical variables. It tells R that a variable is
nominal or ordinal by making it a factor.
data$x = as.factor(data$x)
Data Frames:-
A data frame is a table or a two-dimensional array-like structure in which each columncontains values
of one variable and each row contains one set of values from each column.
3. The data stored in a data frame can be of numeric, factor or character type.
The structure of the data frame can be seen by using str() function.
1.Add Column:-Just add the column vector using a new column name.
2.Add Row:-To add more rows permanently to an existing data frame, we need to bring in the new rowsin the
same structure as the existing data frame and use the rbind() function.
v <- emp.data
print(v)
That will open up a dialog box that looks like the following:
You can also choose di↵erent options for how the figure is constructed (e.g., frequencies or
percentages) by clicking on the percentages tab, which switches the dialog to the one below:
That should produce a bar plot that looks like the following:
Looping in R:-
“Looping”, “cycling”, “iterating” or just replicating instructions is an old practice that originated
well before the invention of computers. It is nothing more than automating a multi-step process by
organizing sequences of actions or ‘batch’ processes and by grouping the parts that need to be
repeated.
All modern programming languages provide special constructs that allow for the repetition of
instructions or blocks of instructions.
According to the R base manual, among the control flow commands, the loop constructs
are for, while and repeat, with the additional clauses break and next.Remember that control flow
commands are the commands that enable a program to branch between alternatives, or to “take
decisions”, so to speak.
If the condition is not met and the resulting outcome is False, the loop is never executed. This is
indicated by the loose arrow on the right of the for loop structure. The program will then execute
the first instruction found after the loop block.
If the condition is verified, an instruction -or block of instructions- i1 is executed. And perhaps
this block of instructions is another loop. In such cases, you speak of a nested loop.
The initialization statement describes the starting point of the loop, where the loop variable is
initialized with a starting value. A loop variable or counter is simply a variable that controls the
flow of the loop. The test expression is the condition until when the loop is repeated
Syntax of for loop:-
statement
Here, sequence is a vector and val takes on each of its value during the loop. In each
iteration, statement is evaluated.
x <- c(2,5,3,9,8,11,6)
count <- 0
for (val in x) {
if(val %% 2 == 0) count = count+1
print(count)
Output
[1] 3
In the above example, the loop iterates 7 times as the vector x has 7 elements.
In each iteration, val takes on the value of corresponding element of x.
We have used a counter to count the number of even numbers in x. We can see that x contains 3
even numbers.
R while Loop
Loops are used in programming to repeat a specific block of code. In this article, you will learn to
create a while loop in R programming.In R programming, while loops are used to loop until a
specific condition is met.
while (test_expression)
statement
Here, test_expression is evaluated and the body of the loop is entered if the result is TRUE.
The statements inside the loop are executed and the flow returns to evaluate
the test_expression again.
This is repeated each time until test_expression evaluates to FALSE, in which case, the loop exits.
A bar chart is a pictorial representation of data that presents categorical data with rectangular
bars with heights or lengths proportional to the values that they represent. In other words, it is
the pictorial representation of dataset. These data sets contain the numerical values of variables
that represent the length or height.
R uses the function barplot() to create bar charts. Here, both vertical and Horizontal bars can be
drawn.
Syntax:
barplot(H, xlab, ylab, main, names.arg, col)
Parameters:
H: This parameter is a vector or matrix containing numeric values which are used in bar
chart.
xlab: This parameter is the label for x axis in bar chart.
ylab: This parameter is the label for y axis in bar chart.
main: This parameter is the title of the bar chart.
names.arg: This parameter is a vector of names appearing under each bar in bar chart.
col: This parameter is used to give colors to the bars in the graph.
Output:
Label, title and colors are some properties in the bar chart which can be added to the bar by
adding and passing an argument.
Approach:
1. To add the title in bar chart.
barplot( A, main = title_name )
2. X-axis and Y-axis can be labeled in bar chart. To add the label in bar chart.
barplot( A, xlab= x_label_name, ylab= y_label_name)
3. To add the color in bar chart.
barplot( A, col=color_name)
Example :
Histograms in R language:-
A histogram contains a rectangular area to display the statistical information which is proportional
to the frequency of a variable and its width in successive numerical intervals. A graphical
representation that manages a group of data points into different specified ranges. It has a special
feature which shows no gaps between the bars and is similar to a vertical bar graph.We can
create histogram in R Programming Language using hist() function.
Creating a simple histogram chart by using the above parameter. This vector v is plot using hist().
Example:
R
Output:
v <- c(19, 23, 11, 5, 16, 21, 32, 14, 19, 27, 39)
Output:
Using histogram return values for labels using text()
R
breaks = 5)
# Setting labels
Boxplots in R Language:-
A box graph is a chart that is used to display information in the form of distribution by drawing
boxplots for each of them. This distribution of data based on five sets (minimum, first quartile,
median, third quartile, maximum).
Boxplots in R Programming Language
Boxplots are created in R by using the boxplot() function.
R
print(head(input))
Output:
Creating the Boxplot
Output:
Scatter plot :-
Scatterplots show many points plotted in the Cartesian plane. Each point represents the values of
two variables. One variable is chosen in the horizontal axis and another in the vertical axis.
The simple scatterplot is created using the plot() function.
Syntax
plot(x, y, main, xlab, ylab, xlim, ylim, axes)
Following is the description of the parameters used −
x is the data set whose values are the horizontal coordinates.
y is the data set whose values are the vertical coordinates.
main is the tile of the graph.
xlab is the label in the horizontal axis.
ylab is the label in the vertical axis.
xlim is the limits of the values of x used for plotting.
ylim is the limits of the values of y used for plotting.
axes indicates whether both axes should be drawn on the plot.
Example
We use the data set "mtcars" available in the R environment to create a basic scatterplot. Let's use
the columns "wt" and "mpg" in mtcars.
The lm() function estimates the intercept and slope coefficients for the linear model that it has fit
to our data.
Whether we can use our model to make predictions will depend on:
1. Whether we can reject the null hypothesis that there is no relationship between our
variables.
2. Whether the model is a good fit for our data.
The output of our model using summary(). The model output will provide us with the information
we need to test our hypothesis and assess how well the model fits our data.