[go: up one dir, main page]

0% found this document useful (0 votes)
2 views30 pages

5 Sem Notes R Programing Unit 12345 Bca BSC

The document provides an overview of R programming, detailing its origins, features, and basic syntax for statistical analysis and data representation. It explains the creation and manipulation of data types such as vectors and matrices, along with examples of basic operations. R is highlighted as a widely used programming language in statistics, supported by a strong community and suitable for various applications.

Uploaded by

ririn79475
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views30 pages

5 Sem Notes R Programing Unit 12345 Bca BSC

The document provides an overview of R programming, detailing its origins, features, and basic syntax for statistical analysis and data representation. It explains the creation and manipulation of data types such as vectors and matrices, along with examples of basic operations. R is highlighted as a widely used programming language in statistics, supported by a strong community and suitable for various applications.

Uploaded by

ririn79475
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

lOMoARcPSD|56194302

5 sem Notes R-Programing unit 12345 BCA BSC

R programming (Nalanda Open University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by awsw adef (gimowa3457@daupload.com)
lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 1


_________________________________________________________________________

R -Overview
R is a programming language and software environment for statistical analysis,
graphics representation and reporting. R was created by Ross Ihaka and Robert
Gentleman at the University of Auckland, New Zealand, and is currently
developed by the R Development Core Team.

The core of R is an interpreted computer language which allows branching and


looping as well as modular programming using functions. R allows integration
with the procedures written in the C, C++, .Net, Python or FORTRAN languages
for efficiency.

F
SU
R is freely available under the GNU General Public License, and pre-compiled
binary versions are provided for various operating systems like Linux, Windows
and Mac.

U
R is free software distributed under a GNU-style copy left, and an official part of
the GNU project called GNU S.
YO
Evolution of R
AD

R was initially written by Ross Ihaka and Robert Gentleman at the


Department of Statistics of the University of Auckland in Auckland, New Zealand.
R made its first appearance in 1993.
M

​ A large group of individuals has contributed to R by sending code and bug


reports.
AM

​ Since mid-1997 there has been a core group (the "R Core Team") who can
modify the R source code archive.

Features of R
H
U

As stated earlier, R is a programming language and software environment for


statistical analysis, graphics representation and reporting. The following are the
M

important features of R −

​ R is a well-developed, simple and effective programming language which


includes conditionals, loops, user defined recursive functions and input
and output facilities.
​ R has an effective data handling and storage facility,
​ R provides a suite of operators for calculations on arrays, lists, vectors and
matrices.
​ R provides a large, coherent and integrated collection of tools for data
analysis.

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 2


_________________________________________________________________________
​ R provides graphical facilities for data analysis and display either directly
at the computer or printing at the papers.

As a conclusion, R is world’s most widely used statistics programming language.


It's the # 1 choice of data scientists and supported by a vibrant and talented
community of contributors. R is taught in universities and deployed in mission
critical business applications. This tutorial will teach you R programming along
with suitable examples in simple and easy steps.

________________________________________________________________

R - Basic Syntax

F
SU
As a convention, we will start learning R programming by writing a "Hello,
World!" program. Depending on the needs, you can program either at R

U
command prompt or you can use an R script file to write your program. Let's
check both one by one.

R Command Prompt
YO
Once you have R environment setup, then it’s easy to start your R command
AD

prompt by just typing the following command at your command prompt −

$R
M

This will launch R interpreter and you will get a prompt > where you can start
typing your program as follows −
AM

> myString <- "Hello, World!"


> print ( myString)
[1] "Hello, World!"
H

Here first statement defines a string variable myString, where we assign a string
U

"Hello, World!" and then next statement print() is being used to print the value
stored in variable myString.
M

R Script File
Usually, you will do your programming by writing your programs in script files
and then you execute those scripts at your command prompt with the help of R
interpreter called Rscript. So let's start with writing following code in a text file
called test.R as under − Demo

# My first program in R Programming


myString <- "Hello, World!"

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 3


_________________________________________________________________________

print ( myString)

Save the above code in a file test.R and execute it at Linux command prompt as
given below. Even if you are using Windows or other system, syntax will remain
same.

$ Rscript test.R

When we run the above program, it produces the following result.

[1] "Hello, World!"

F
SU
Comments
Comments are like helping text in your R program and they are ignored by the
interpreter while executing your actual program. Single comment is written using

U
# in the beginning of the statement as follows −

# My first program in R Programming YO


R does not support multi-line comments but you can perform a trick which is
something as follows −Live Demo
AD

if(FALSE) {
"This is a demo for multi-line comments and it should be put
inside either a
M

single OR double quote"


}
AM

myString <- "Hello, World!"


print ( myString)
[1] "Hello, World!"
H

Though above comments will be executed by R interpreter, they will not interfere
with your actual program. You should put such comments inside, either single or
U

double quote.
M

_____________________________________________________

R - Data Types

Generally, while doing programming in any programming language, you need to


use various variables to store various information. Variables are nothing but
reserved memory locations to store values. This means that, when you create a
variable you reserve some space in memory.

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 4


_________________________________________________________________________
You may like to store information of various data types like character, wide
character, integer, floating point, double floating point, Boolean etc. Based on the
data type of a variable, the operating system allocates memory and decides
what can be stored in the reserved memory.

In contrast to other programming languages like C and java in R, the variables


are not declared as some data type. The variables are assigned with R-Objects
and the data type of the R-object becomes the data type of the variable. There
are many types of R-objects. The frequently used ones are −

​ Vectors

F
​ Lists
​ Matrices

SU
​ Arrays
​ Factors
​ Data Frames

U
The simplest of these objects is the vector object and there are six data types

YO
of these atomic vectors, also termed as six classes of vectors. The other
R-Objects are built upon the atomic vectors.
AD

Data Type Example Verify

Logical TRUE, FALSE Live Demo


v <- TRUE
M

print(class(v))
it produces the following result −
AM

[1] "logical"

Numeric 12.3, 5, 999 Live Demo


H

v <- 23.5
print(class(v))
U

it produces the following result −


[1] "numeric"
M

Integer 2L, 34L, 0L Live Demo


v <- 2L
print(class(v))
it produces the following result −
[1] "integer"

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 5


_________________________________________________________________________

Complex 3 + 2i Live Demo


v <- 2+5i
print(class(v))
it produces the following result −
[1] "complex"

Character 'a' , '"good", "TRUE", Live Demo


'23.4' v <- "TRUE"
print(class(v))

F
it produces the following result −
[1] "character"

SU
Raw "Hello" is stored as 48 Live Demo

U
65 6c 6c 6f v <- charToRaw("Hello")
print(class(v))
YO
it produces the following result −
[1] "raw"
AD

In R programming, the very basic data types are the R-objects called vectors
which hold elements of different classes as shown above. Please note in R the
number of classes is not confined to only the above six types. For example, we
M

can use many atomic vectors and create an array whose class will become array.
_______________________________________________
AM

Vectors
H

When you want to create vector with more than one element, you should use
c() function which means to combine the elements into a vector.
U

Live Demo
M

# Create a vector.
apple <- c('red','green',"yellow")
print(apple)

# Get the class of the vector.


print(class(apple))

When we execute the above code, it produces the following result −

[1] "red" "green" "yellow"


[1] "character"
_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 6


_________________________________________________________________________
> x <- c(1, -1, 3.5, 2)
>x
[1] 1.0 -1.0 3.5 2.0
Then if we want to add 2 to everything in this vector, or to square each
entry:
>x+2
[1] 3.0 1.0 5.5 4.0
> x^2
[1] 1.00 ,1.00, 12,25, 4.00
This is very useful in statistics:
> sum((x - mean(x))^2)

F
[1] 10.69

SU
___________________________________________________________

R - Matrices

U
Matrices are the R objects in which the elements are arranged in a
two-dimensional rectangular layout. They contain elements of the same
YO
atomic types. Though we can create a matrix containing only characters or
only logical values, they are not of much use. We use matrices containing
AD

numeric elements to be used in mathematical calculations.

A Matrix is created using the matrix() function.


M

Syntax
AM

The basic syntax for creating a matrix in R is −

matrix(data, nrow, ncol, byrow, dimnames)


H

Following is the description of the parameters used −


U

​ data is the input vector which becomes the data elements of the
matrix.
M

​ nrow is the number of rows to be created.


​ ncol is the number of columns to be created.
​ byrow is a logical clue. If TRUE then the input vector elements are
arranged by row.
​ dimname is the names assigned to the rows and columns.

Example
Create a matrix taking a vector of numbers as input.

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 7


_________________________________________________________________________

Live Demo
# Elements are arranged sequentially by row.
M <- matrix(c(3:14), nrow = 4, byrow = TRUE)
print(M)

# Elements are arranged sequentially by column.


N <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(N)

# Define the column and row names.


rownames = c("row1", "row2", "row3", "row4")

F
colnames = c("col1", "col2", "col3")

SU
P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames =
list(rownames, colnames))
print(P)

U
When we execute the above code, it produces the following result −

[,1] [,2] [,3]


[1,] 3 4 5
YO
[2,] 6 7 8
[3,] 9 10 11
AD

[4,] 12 13 14
[,1] [,2] [,3]
[1,] 3 7 11
M

[2,] 4 8 12
[3,] 5 9 13
AM

[4,] 6 10 14
col1 col2 col3
row1 3 4 5
row2 6 7 8
H

row3 9 10 11
row4 12 13 14
U

Accessing Elements of a Matrix


M

Elements of a matrix can be accessed by using the column and row index of
the element. We consider the matrix P above to find the specific elements
below.

Live Demo
# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 8


_________________________________________________________________________
# Create the matrix.
P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames =
list(rownames, colnames))

# Access the element at 3rd column and 1st row.


print(P[1,3])

# Access the element at 2nd column and 4th row.


print(P[4,2])

# Access only the 2nd row.

F
print(P[2,])

SU
# Access only the 3rd column.
print(P[,3])

U
When we execute the above code, it produces the following result −

[1] 5
[1] 13
col1 col2 col3
YO
6 7 8
row1 row2 row3 row4
AD

5 8 11 14

Matrix Computations
M

Various mathematical operations are performed on the matrices using the R


AM

operators. The result of the operation is also a matrix.

The dimensions (number of rows and columns) should be same for the
matrices involved in the operation.
H

Matrix Addition & Subtraction


U

Live Demo
M

# Create two 2x3 matrices.


matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)

matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)


print(matrix2)

# Add the matrices.


result <- matrix1 + matrix2
cat("Result of addition","\n")

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 9


_________________________________________________________________________
print(result)

# Subtract the matrices


result <- matrix1 - matrix2
cat("Result of subtraction","\n")
print(result)

When we execute the above code, it produces the following result −

[,1] [,2] [,3]


[1,] 3 -1 2
[2,] 9 4 6

F
[,1] [,2] [,3]

SU
[1,] 5 0 3
[2,] 2 9 4
Result of addition

U
[,1] [,2] [,3]
[1,] 8 -1 5
[2,] 11 13 10
Result of subtraction
[,1] [,2] [,3]
YO
[1,] -2 -1 -1
AD

[2,] 7 -5 2

Matrix Multiplication & Division


Live Demo
M

# Create two 2x3 matrices.


matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
AM

print(matrix1)

matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)


print(matrix2)
H

# Multiply the matrices.


U

result <- matrix1 * matrix2


M

cat("Result of multiplication","\n")
print(result)

# Divide the matrices


result <- matrix1 / matrix2
cat("Result of division","\n")
print(result)

When we execute the above code, it produces the following result −

[,1] [,2] [,3]

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 10


_________________________________________________________________________
[1,] 3 -1 2
[2,] 9 4 6
[,1] [,2] [,3]
[1,] 5 0 3
[2,] 2 9 4
Result of multiplication
[,1] [,2] [,3]
[1,] 15 0 6
[2,] 18 36 24
Result of division
[,1] [,2] [,3]

F
[1,] 0.6 -Inf 0.6666667
[2,] 4.5 0.4444444 1.5000000

SU
_______________________________________________________

U
Arrays are the R data objects which can store data in more than two
dimensions. For example − If we create an array of dimension (2, 3, 4) then
YO
it creates 4 rectangular matrices each with 2 rows and 3 columns. Arrays can
store only data type.
AD

An array is created using the array() function. It takes vectors as input and
uses the values in the dim parameter to create an array.
M

Example
AM

The following example creates an array of two 3x3 matrices each with 3 rows
and 3 columns.Live Demo
H

# Create two vectors of different lengths.


U

vector1 <- c(5,9,3)


vector2 <- c(10,11,12,13,14,15)
M

# Take these vectors as input to the array.


result <- array(c(vector1,vector2),dim = c(3,3,2))
print(result)

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 11


_________________________________________________________________________

When we execute the above code, it produces the following result −

,,1
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15
,,2
[,1] [,2] [,3]

F
[1,] 5 10 13

SU
[2,] 9 11 14
[3,] 3 12 15

U
Naming Columns and Rows
YO
We can give names to the rows, columns and matrices in the array by using
the dimnames parameter.Live Demo
AD

# Create two vectors of different lengths.


vector1 <- c(5,9,3)
M

vector2 <- c(10,11,12,13,14,15)


column.names <- c("COL1","COL2","COL3")
AM

row.names <- c("ROW1","ROW2","ROW3")


matrix.names <- c("Matrix1","Matrix2")
H

# Take these vectors as input to the array.


U

result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames =


list(row.names,column.names,
M

matrix.names))
print(result)

When we execute the above code, it produces the following result −

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 12


_________________________________________________________________________
, , Matrix1
COL1 COL2 COL3
ROW1 5 10 13
ROW2 9 11 14
ROW3 3 12 15

, , Matrix2
COL1 COL2 COL3

F
ROW1 5 10 13

SU
ROW2 9 11 14
ROW3 3 12 15

U
Accessing Array Elements
Live Demo
# Create two vectors of different lengths.
YO
vector1 <- c(5,9,3)
AD

vector2 <- c(10,11,12,13,14,15)


column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
M

matrix.names <- c("Matrix1","Matrix2")


AM

# Take these vectors as input to the array.


result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames =
list(row.names, column.names, matrix.names))
H
U

# Print the third row of the second matrix of the array.


M

print(result[3,,2])

# Print the element in the 1st row and 3rd column of the 1st
matrix.
print(result[1,3,1])

# Print the 2nd Matrix.


print(result[,,2])

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 13


_________________________________________________________________________

When we execute the above code, it produces the following result −

COL1 COL2 COL3


3 12 15
[1] 13
COL1 COL2 COL3
ROW1 5 10 13
ROW2 9 11 14
ROW3 3 12 15

F
SU
Manipulating Array Elements
As array is made up matrices in multiple dimensions, the operations on

U
elements of array are carried out by accessing elements of the matrices.
Demo YO
# Create two vectors of different lengths.
vector1 <- c(5,9,3)
AD

vector2 <- c(10,11,12,13,14,15)


M

# Take these vectors as input to the array.


array1 <- array(c(vector1,vector2),dim = c(3,3,2))
AM

# Create two vectors of different lengths.


vector3 <- c(9,1,0)
H

vector4 <- c(6,0,11,3,14,1,2,6,9)


U

array2 <- array(c(vector1,vector2),dim = c(3,3,2))


M

# create matrices from these arrays.


matrix1 <- array1[,,2]
matrix2 <- array2[,,2]

# Add the matrices.


result <- matrix1+matrix2
print(result)

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 14


_________________________________________________________________________

When we execute the above code, it produces the following result −

[,1] [,2] [,3]


[1,] 10 20 26
[2,] 18 22 28
[3,] 6 24 30

F
Calculations Across Array Elements

SU
We can do calculations across the elements in an array using the apply()
function.

U
Syntax
apply(x, margin, fun)
YO
Following is the description of the parameters used −
AD

​ x is an array.
​ margin is the name of the data set used.
M

​ fun is the function to be applied across the elements of the array.


AM

Example
H

We use the apply() function below to calculate the sum of the elements in
the rows of an array across all the matrices.
U

Live Demo
M

# Create two vectors of different lengths.


vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

# Take these vectors as input to the array.


new.array <- array(c(vector1,vector2),dim = c(3,3,2))
print(new.array)

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 15


_________________________________________________________________________

# Use apply to calculate the sum of the rows across all the
matrices.
result <- apply(new.array, c(1), sum)
print(result)

When we execute the above code, it produces the following result −

F
,,1

SU
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14

U
[3,] 3 12 15

,,2
YO
[,1] [,2] [,3]
AD

[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15
M
AM

[1] 56 68 60
_______________________________________________________

Non numerics
H

In R programming, non-numeric values refer to data types and values that are not
represented as numbers. Non-numeric values are essential for working with diverse
U

types of data, including text, categorical data, logical values, and more. Here are
some common non-numeric data types in R:
M

​ Character (String) Data:


● Character data is used to represent text, such as words, sentences, and
labels.
● You can create character vectors by enclosing text in quotation marks,
either single (') or double (").
● Example:

string <- "Hello, World!"

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 16


_________________________________________________________________________
​ Factors:
● Factors are used to represent categorical data with predefined levels.
● They are particularly useful when you have data with a limited number
of distinct categories.
● You can create factors using the factor() function.

Example:<- factor(c("Low", "Medium", "High"))

​ Logical Values:
● Logical values represent binary data with only two possible values:
TRUE or FALSE.

F
● They are often used for conditional statements and logical operations.

SU
Example:is_raining <- TRUE

U
​ Date and Time Data:
● R provides data types for working with date and time information,
including Date, POSIXct, and POSIXlt.
YO
● You can manipulate and perform calculations with dates and times
using these data types.
● Example:my_date <- as.Date("2023-11-09")
AD

​ Complex Data:
M

● R supports complex numbers for advanced mathematical operations.


● Complex numbers have both a real and an imaginary part.
AM

● Example:complex_number <- 3 + 2i


​ Missing Values:
H

● Missing values in R are represented by the special value NA.


● NA is used when data is missing or undefined.
U

● Example:my_value <- NA


M

​ Special Values:
● R includes special values such as NaN (Not-a-Number) and Inf (Infinity)
for specific mathematical situations.
● NaN is used to indicate undefined or unrepresentable results in
calculations.
● Inf represents positive or negative infinity.
● Example: result <- 1 / 0 # Results in Inf

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 17


_________________________________________________________________________
These non-numeric data types are crucial for working with a wide range of data in R,
allowing you to handle text, categorical data, logical conditions, date and time
information, and special values effectively in your data analysis and programming
tasks.

_____________________________________________

Lists
A list is an R-object which can contain many different types of elements inside it

F
like vectors, functions and even another list inside it.

SU
Live Demo
# Create a list.
list1 <- list(c(2,5,3),21.3,sin)

U
# Print the list.
print(list1)

When we execute the above code, it produces the following result −


YO
[[1]]
AD

[1] 2 5 3

[[2]]
[1] 21.3
M

[[3]]
AM

function (x) .Primitive("sin")


__________________________________________________

R - Data Frames
H
U
M

A data frame is a table or a two-dimensional array-like structure in which


each column contains values of one variable and each row contains one set of
values from each column.

Following are the characteristics of a data frame.

​ The column names should be non-empty.


​ The row names should be unique.

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 18


_________________________________________________________________________

​ The data stored in a data frame can be of numeric, factor or character


type.
​ Each column should contain same number of data items.

Create Data Frame


Live Demo
# Create the data frame.
emp.data <- data.frame(

F
emp_id = c (1:5),

SU
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

U
start_date = as.Date(c("2012-01-01", "2013-09-23",
"2014-11-15", "2014-05-11",
"2015-03-27")),
YO
stringsAsFactors = FALSE
AD

)
# Print the data frame.
print(emp.data)
M

When we execute the above code, it produces the following result −


AM

emp_id emp_name salary start_date


1 1 Rick 623.30 2012-01-01
H

2 2 Dan 515.20 2013-09-23


3 3 Michelle 611.00 2014-11-15
U

4 4 Ryan 729.00 2014-05-11


M

5 5 Gary 843.25 2015-03-27

Get the Structure of the Data Frame


The structure of the data frame can be seen by using str() function.

Live Demo

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 19


_________________________________________________________________________
# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23",


"2014-11-15", "2014-05-11",

F
"2015-03-27")),
stringsAsFactors = FALSE

SU
)
# Get the structure of the data frame.

U
str(emp.data)

When we execute the above code, it produces the following result − YO


'data.frame': 5 obs. of 4 variables:
AD

$ emp_id : int 1 2 3 4 5
$ emp_name : chr "Rick" "Dan" "Michelle" "Ryan" ...
$ salary : num 623 515 611 729 843
M

$ start_date: Date, format: "2012-01-01" "2013-09-23" "2014-11-15" "2014-05-11" ...


AM

Summary of Data in Data Frame


H

The statistical summary and nature of the data can be obtained by applying
U

summary() function.
M

Live Demo
# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 20


_________________________________________________________________________
start_date = as.Date(c("2012-01-01", "2013-09-23",
"2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Print the summary.
print(summary(emp.data))

When we execute the above code, it produces the following result −

F
SU
emp_id emp_name salary start_date
Min. :1 Length:5 Min. :515.2 Min. :2012-01-01
1st Qu.:2 Class :character 1st Qu.:611.0 1st Qu.:2013-09-23

U
Median :3 Mode :character Median :623.3 Median :2014-05-11
Mean :3
3rd Qu.:4
Mean :664.4 Mean :2014-01-14
3rd Qu.:729.0 3rd Qu.:2014-11-15
YO
Max. :5 Max. :843.2 Max. :2015-03-27
AD

Extract Data from Data Frame


M

Extract specific column from a data frame using column name.


AM

Live Demo
# Create the data frame.
H

emp.data <- data.frame(


emp_id = c (1:5),
U

emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
M

salary = c(623.3,515.2,611.0,729.0,843.25),

start_date =
as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Extract Specific columns.
_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 21


_________________________________________________________________________
result <- data.frame(emp.data$emp_name,emp.data$salary)
print(result)

When we execute the above code, it produces the following result −

emp.data.emp_name emp.data.salary
1 Rick 623.30
2 Dan 515.20
3 Michelle 611.00

F
4 Ryan 729.00

SU
5 Gary 843.25

U
Extract the first two rows and then all columns

Live Demo
# Create the data frame.
YO
emp.data <- data.frame(
AD

emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
M
AM

start_date = as.Date(c("2012-01-01", "2013-09-23",


"2014-11-15", "2014-05-11",
"2015-03-27")),
H

stringsAsFactors = FALSE
)
U

# Extract first two rows.


M

result <- emp.data[1:2,]


print(result)

When we execute the above code, it produces the following result −

emp_id emp_name salary start_date


1 1 Rick 623.3 2012-01-01
2 2 Dan 515.2 2013-09-23

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 22


_________________________________________________________________________

Extract 3rd and 5th row with 2nd and 4th column

Live Demo
# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),

F
salary = c(623.3,515.2,611.0,729.0,843.25),

SU
start_date = as.Date(c("2012-01-01", "2013-09-23",
"2014-11-15", "2014-05-11",

U
"2015-03-27")),

)
stringsAsFactors = FALSE YO
AD

# Extract 3rd and 5th row with 2nd and 4th column.
result <- emp.data[c(3,5),c(2,4)]
print(result)
M

When we execute the above code, it produces the following result −


AM

emp_name start_date
3 Michelle 2014-11-15
H

5 Gary 2015-03-27
U
M

Expand Data Frame


A data frame can be expanded by adding columns and rows.

Add Column

Just add the column vector using a new column name.

Live Demo
_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 23


_________________________________________________________________________
# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23",


"2014-11-15", "2014-05-11",

F
"2015-03-27")),
stringsAsFactors = FALSE

SU
)

U
# Add the "dept" coulmn.
emp.data$dept <- c("IT","Operations","IT","HR","Finance")
v <- emp.data
YO
print(v)
AD

When we execute the above code, it produces the following result −

emp_id emp_name salary start_date dept


M

1 1 Rick 623.30 2012-01-01 IT


2 2 Dan 515.20 2013-09-23 Operations
AM

3 3 Michelle 611.00 2014-11-15 IT


4 4 Ryan 729.00 2014-05-11 HR
H

5 5 Gary 843.25 2015-03-27 Finance


U

Add Row
M

To add more rows permanently to an existing data frame, we need to bring in


the new rows in the same structure as the existing data frame and use the
rbind() function.

In the example below we create a data frame with new rows and merge it
with the existing data frame to create the final data frame.

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 24


_________________________________________________________________________

Live Demo
# Create the first data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23",

F
"2014-11-15", "2014-05-11",

SU
"2015-03-27")),
dept = c("IT","Operations","IT","HR","Finance"),
stringsAsFactors = FALSE

U
)

# Create the second data frame


YO
emp.newdata <- data.frame(
emp_id = c (6:8),
AD

emp_name = c("Rasmi","Pranab","Tusar"),
salary = c(578.0,722.5,632.8),
M

start_date =
as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
AM

dept = c("IT","Operations","Fianance"),
stringsAsFactors = FALSE
)
H
U

# Bind the two data frames.


emp.finaldata <- rbind(emp.data,emp.newdata)
M

print(emp.finaldata)

When we execute the above code, it produces the following result −

emp_id emp_name salary start_date dept


1 1 Rick 623.30 2012-01-01 IT
2 2 Dan 515.20 2013-09-23 Operations
3 3 Michelle 611.00 2014-11-15 IT

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 25


_________________________________________________________________________
4 4 Ryan 729.00 2014-05-11 HR
5 5 Gary 843.25 2015-03-27 Finance
6 6 Rasmi 578.00 2013-05-21 IT
7 7 Pranab 722.50 2013-07-30 Operations
8 8 Tusar 632.80 2014-06-17 Fianance
_____________________________________________

SPECIAL VALUES

F
In R programming, special values refer to specific values that are used in special cases,

SU
often in mathematical and computational contexts. These values have distinct meanings
and are used to represent exceptional situations. Some of the commonly used special
values in R are:

U
NA (Not Available):
YO
● NA represents missing or undefined data. It is used when data is not available
or cannot be determined.
AD

● It's often used to indicate missing values in data sets.


● Example: my_data <- c(1, NA, 3, 4, NA)
M

NaN (Not-a-Number):
AM

● NaN is used to represent undefined or unrepresentable results in mathematical


operations, especially those involving zero divided by zero or infinity divided by
infinity.
H

● It is a way to signal that a computation doesn't produce a meaningful numeric


U

result.
M

● Example: result <- 0/0 # Results in NaN

Inf (Infinity):
● Inf represents positive infinity. It's used to indicate values that are larger than
any finite number.
● It can result from operations like division by zero.
● Example: result <- 1 / 0 # Results in Inf

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 26


_________________________________________________________________________

-Inf (Negative Infinity):


● -Inf represents negative infinity. It's used to indicate values that are smaller
(more negative) than any finite number.
● It can also result from operations like division by zero with a negative
dividend.
● Example: result <- -1 / 0 # Results in -Inf
_______________________________________________________

F
Classes and Coercion

SU
In R programming, "Classes" and "Coercion" are concepts related to data types and

U
type conversion. Let's explore these concepts:

Classes:

​ What are Classes?


YO
● In R, a class is a way to categorize objects based on their data type or
AD

structure.
● Each object in R belongs to one or more classes that define its
behavior and available methods.
● Classes are essential for object-oriented programming (OOP) in R.
M

​ Common Classes in R:
● Numeric: Objects of class "numeric" represent real numbers.
AM

● Character: Objects of class "character" store text data.


● Integer: Objects of class "integer" represent whole numbers.
● Factor: Objects of class "factor" represent categorical data with
H

predefined levels.
● Data Frame: Objects of class "data.frame" represent structured data
U

tables.
● List: Objects of class "list" can hold elements of different classes.
M

● Many other classes are defined by R packages for specialized


purposes.
​ Defining Custom Classes:
● You can define custom classes in R using object-oriented programming
principles.
● This allows you to create objects with specific attributes and methods
tailored to your needs.
● The setClass() function from the "methods" package is commonly
used to define custom classes.
_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 27


_________________________________________________________________________
Coercion:

​ What is Coercion?
● Coercion refers to the automatic or manual conversion of objects from
one class to another.
● R will automatically coerce objects when performing operations
involving different classes to ensure compatibility.
​ Implicit and Explicit Coercion:
● Implicit Coercion: R performs implicit (automatic) coercion when
necessary to make operations work. For example, when adding a

F
numeric and an integer, the integer is implicitly coerced to a numeric.
● Explicit Coercion: You can explicitly coerce objects from one class to

SU
another using functions like as.numeric(), as.character(), etc.
​ Example of Coercion:
● If you want to add a numeric vector and a character vector, R will

U
automatically coerce the character vector to numeric if possible. This
can lead to unexpected results if the character vector contains
non-numeric values. YO
● Explicit coercion can be used to control the conversion process, for
instance, by using as.numeric() to convert a character vector to
AD

numeric explicitly.

Example of Implicit Coercion:


x <- 5
M

y <- "10"
result <- x + y # Implicit coercion of "y" to numeric: result is 15
AM

Example of Explicit Coercion:


x <- 5
H

y <- "10"
y <- as.numeric(y) # Explicitly coerce "y" to numeric
U

result <- x + y # No coercion needed: result is 15


M

Understanding classes and coercion is important for handling different data types
and ensuring that your data manipulations and calculations are performed correctly
in R.

______________________________________________________________

BASIC PLOTTING

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 28


_________________________________________________________________________
In R programming, "Basic Plotting" refers to the process of creating simple graphical
representations of data using various plotting functions and tools. Basic plotting is
an essential part of data visualization and exploratory data analysis in R. It allows
you to create a wide range of charts and graphs to better understand your data and
convey insights to others. Here are some key aspects of basic plotting in R:

​ Plotting Functions:
● R provides a variety of plotting functions, including plot(), hist(),
barplot(), and more, to create different types of plots.
● The choice of function depends on the nature of your data and the type

F
of plot you want to generate.
​ Common Types of Basic Plots:

SU
● Scatter Plots: Used to visualize the relationship between two
continuous variables.
● Histograms: Display the distribution of a single variable by dividing it

U
into bins.

columns. YO
● Bar Charts: Represent categorical or discrete data using bars or

● Line Plots: Show trends or patterns in data over time or another


continuous variable.
AD

● Box Plots: Visualize the distribution and summary statistics of a


dataset.
● Pie Charts: Display the proportion of different categories in a dataset.
​ Customization:
M

● R provides extensive customization options for plots. You can adjust


colors, labels, titles, axis scales, and more to tailor the plot's
AM

appearance to your needs.


● You can add text, points, lines, and legends to make the plot more
informative.
H

​ Graphics Parameters:
● You can set global graphical parameters using functions like par() to
U

control aspects of the graphical output, such as the layout, margins,


and fonts.
M

​ Using Packages for Advanced Plots:


● While basic plotting is sufficient for many purposes, R has packages
like ggplot2, lattice, and ggvis that offer more advanced and flexible
options for creating complex and customized data visualizations.

Example of Creating a Scatter Plot (Basic Plotting):

# Create sample data


x <- c(1, 2, 3, 4, 5)

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)


lOMoARcPSD|56194302

Statistical Computing & R Programming (Theory) 29


_________________________________________________________________________
y <- c(2, 4, 5, 4, 6)
# Create a scatter plot
plot(x, y, main = "Scatter Plot Example", xlab = "X-Axis", ylab = "Y-Axis", pch = 19, col =
"blue")

In this example, we use the plot() function to create a scatter plot of two variables x
and y. We customize the plot with a title, axis labels, point style (pch), and color. This
is a basic example of how you can create and customize plots in R for data
exploration and presentation.

F
V

SU
U
YO
AD
M
AM
H
U
M

_________________________________________________________________________
Prepared by MUHAMMAD YOUSUF

Downloaded by awsw adef (gimowa3457@daupload.com)

You might also like