Unit 1
Unit 1
Unit 1
Unit-1
Introduction to R- Programming
R is a programming language and software environment for statistical analysis and
graphics representation.
R was created by Ross Ihaka and Robert Gentleman at the University of Auckland,
NewZealand, and is currently developed by the R Development Core Team.
R is freely available under the GNU General Public License.
This programming language was named R,based on the first letter of first name of the two R
authors (Robert Gentleman and Ross Ihaka.
R is the most popular data analytics tool as it is open-source, flexible, offers multiple packages
and has a huge community.
Why R?
R is a programming and statistical language.
R is used for data Analysis and Visualization.
R is simple and easy to learn, read and write.
R is an example of a FLOSS (Free Libre and Open Source Software) where one can freely
distribute copies of this software, read its source code, modify it, etc.
Who uses R?
The Consumer Financial Protection Bureau uses R for data analysis
Statisticians at John Deere use R for time series modeling and geospatial analysis in a
reliable and reproducible way.
Bank of America uses R for reporting.
R is part of technology stack behind Foursquare’s famed recommendation engine.
ANZ, the fourth largest bank in Australia, using R for credit risk analysis.
Google uses R to predict Economic Activity.
Mozilla, the foundation responsible for the Firefox web browser, uses R to visualize Web
activity.
Evolution of R
R is an implementation of S programming language which was created by John
Chambers at Bell Labs.
R was initially written by Ross Ihaka and Robert Gentleman at the Department of
Statistics of the University of Auckland in Auckland, New Zealand.
1
Dr.V.N.Gopiraju, CIET, LAM
Data Analytics through R
R made its first public appearance in 1993.
A large group of individuals has contributed to R by sending code and bug reports.Since
mid-1997 there has been a core group (the "R Core Team") who can modify the R source
code archive.
In the year 2000 R 1.0.0 released.
R 3.0.0 was released in 2013.
Features of R:
R supports procedural programming with functions and object-oriented
programming with generic functions. Procedural programming includes procedure,
records, modules, and procedure calls. While object-oriented programming language
includes class, objects, and functions.
Packages are part of R programming. Hence, they are useful in collecting sets of R
functions into a single unit.
R is a well-developed, simple and effective programming language which includes
conditionals, loops, user defined recursive functions and input and output facilities.
R has an effective data handling and storage facility,
R provides a suite of operators for calculations on arrays, lists, vectors and matrices.
R provides a large, coherent and integrated collection of tools for data analysis. It
provides graphical facilities for data analysis and display either directly at the computer
or printing at the papers.
Rs programming features include database input, exporting data, viewing data, variable
labels, missing data, etc.
R is an interpreted language. So we can access it through command line interpreter.
R supports matrix arithmetic.
R, SAS, and SPSS are three statistical languages. Of these three statistical languages, R
is the only an open source.
2
Dr.V.N.Gopiraju, CIET, LAM
Data Analytics through R
3
Dr.V.N.Gopiraju, CIET, LAM
Data Analytics through R
Usually, you will do your programming by writing your programs in script files and then you
execute those scripts at your command prompt with the help of R interpreter called Rscript. So
let's start with writing following code in a text file called test.R as under –
# My first program in R Programming
myString<- "Hello, World!"
print ( myString)
Save the above code in a file test.R. Execute by opening that script in R editor, select all (Ctrl
+A) and click on run line or selection (Ctrl+R) option in Edit menu of R console.
When we run the above program, it produces the following result.
[1] "Hello, World!"
It is Very Important to understand because these are the objects you will manipulate on a day-to-
day basis in R. Dealing with object conversions is one of the most common sources of frustration
for beginners.
5
Dr.V.N.Gopiraju, CIET, LAM
Data Analytics through R
character
numeric (real or decimal)
integer
logical
complex
Data Objects in R:
Data types are used to store information. In R, we do not need to declare a variable as some data
type. The variables are assigned with R-Objects and the data type of the R-object becomes the
data type of the variable.There are mainly six data types present in R:
1. Vectors
2. Lists
3. Matrices
4. Arrays
5. Factors
6. Data Frames
6
Dr.V.N.Gopiraju, CIET, LAM
Data Analytics through R
Scalar: Scalar variable A scalar is a single number. The following code creates. a scalar variable
with the numeric value 5: x = 5. Vector variable A vector is a sequence of numbers.
1. Vector: A Vector is a sequence of data elements of the same basic type.
Example 1:
>vtr = c(1, 3, 5 ,7 9) or >vtr<- c (1, 3, 5 ,7 9)
>print(vtr)
o/p: [1] 1 3 5 7 9
>v = 2:12
> print(v)
o/p: [1] 2 3 4 5 6 7 8 9 10 11 12
> v = 3.5:10.5
>v
o/p: [1] 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5
Example 3: If the final element specified does not belong to the sequence then it is discarded.
> v <- 3.8:11
>v
[1] 3.8 4.8 5.8 6.8 7.8 8.8 9.8 10.8
8
Dr.V.N.Gopiraju, CIET, LAM
Data analytics Using
R
2. List: Lists are the R objects which contain elements of different types like − numbers,
strings, vectors and another list inside it.A list can also contain a matrix or a function as its
elements. List is created using list() function.
Example 1:
>n = c(2, 3, 5)
>s = c("aa", "bb", "cc", "dd", "ee")
>x = list(n, s, TRUE)
>x
O/p –
[[1]]
[1] 2 3 5
[[2]]
[1] "aa" "bb" "cc" "dd" "ee"
[[3]]
[1] TRUE
[[2]]
[[2]][[1]]
[1] "green"
[[2]][[2]]
[1] 12.3
>Mat
Output :
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
Examples 2:using byrow=TRUE/FALSE
# Create a matrix.
>M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
>print(M)
o/p: [,1] [,2] [,3]
[1,] "a" "a" "b"
[2,] "c" "b" "a"
d 12 13 14
1 3 4 5
2 6 7 8
3 9 10 11
4 12 13 14
4. Arrays: Arrays are the R data objects which can store data in more than two
dimensions.For example − If we create an array of dimension (2, 3, 4) then it creates 4
rectangular matrices each with 2 rows and 3 columns.While matrices areconfined to two
dimensions, arrays can be of any number of dimensions.An array is created using
the array() function. It takes vectors as input and uses the values in the dim parameter to
create an array. In the below example we create 2 arrays of which are 3x3 matrices each.
Examples 1: Here we create two arrays with two elements which are 3x3 matrices each
>v1 <- c(5,9,3)
>v2 <- c(10,11,12,13,14,15)
>result<- array(c(v1,v2),dim = c(3,3,2))
>result
Output –
,,1
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15
,,2
[,1] [,2] [,3]
[1,] "yellow" "green" "yellow"
[2,] "green" "yellow" "green"
[3,] "yellow" "green" "yellow"
5. Factors: Factors are the data objects which are used to categorize the data and store it as
levels. They can store both strings and integers. They are useful in data analysis for
statistical modeling.
Factors are created using the factor() function. The nlevels functions gives the count of levels.
# Create a vector.
apple_colors<- c('green','green','yellow','red','red','red','green')
Output :
std_id std_name marks
1 1 Rick 623.30
2 2 Dan 515.20
3 3 Michelle 611.00
4 4 Ryan 729.00
5 5 Gary 843.25
By this, we come to the end of different data types in R. Next, let us move forward in R
Tutorial blog and understand another key concept – flow control statements.
Variables: A variable provides us with named storage that our programs can manipulate. A
variable in R can store an atomic vector, group of atomic vectors or a combination of many R-
objects. A valid variable name consists of letters, numbers and the dot or underline characters.
The variable name starts with a letter or the dot not followed by a number.
Variable Name Validity Reason
var_name2. valid Has letters, numbers, dot and underscore
var_name% Invalid Has the character '%'. Only dot(.) and underscore allowed.
2var_name invalid Starts with a number
.var_name , Can start with a dot(.) but the dot(.)should not be followed
valid
var.name by a number.
.2var_name invalid The starting dot is followed by a number making it invalid.
_var_name invalid Starts with _ which is not valid
Variable Assignment
The variables can be assigned values using leftward, rightward and equal to operator. The
values of the variables can be printed using print() or cat()function. The cat() function
combines multiple items into a continuous print output.
# Assignment using equal
operator. var.1 = c(0,1,2,3)
print(var.1)
cat ("var.1 is ", var.1 ,"\n")
cat ("var.2 is ", var.2 ,"\n")
cat ("var.3 is ", var.3 ,"\n")
When we execute the above code, it produces the following result −
[1] 0 1 2 3
var.1 is 0 1 2 3
var.2 is learn R
var.3 is 1 1
Note − The vector c(TRUE,1) has a mix of logical and numeric class. So logical class is
coerced to numeric class making TRUE as 1.
Deleting Variables
Variables can be deleted by using the rm() function. Below we delete the variable var.3. On
printing the value of the variable error is thrown.
rm(var.3)
print(var.3)
When we execute the above code, it produces the following result −
[1] "var.3"
Error in print(var.3) : object 'var.3' not found
All the variables can be deleted by using the rm() and ls() function together.
rm(list = ls())
print(ls())
When we execute the above code, it produces the following result −
character(0)
Operators:
An operator is a symbol that tells the compiler to perform specific mathematical or logical
manipulations. R language is rich in built-in operators and provides following types of
operators.
Types of Operators
We have the following types of operators in R programming −
Arithmetic Operators
Relational Operators
Logical Operators
Assignment Operators
Miscellaneous Operators
Arithmetic Operators:
Following table shows the arithmetic operators supported by R language. The operators act on
each element of the vector.
[1] 0 1 1
The database is attached to the R search path. This means that the database is searched
by R when evaluating a variable, so objects in the database can be accessed by simply giving
their names.
attach() function makes the data available to the R Search Path.
Syntax:
attach(what, pos = 2L, name = deparse(substitute(what), backtick=FALSE),
warn.conflicts = TRUE)
Arguments
what
‘database’. This can be a data.frame or a list or a R data file created
with save or NULL or an environment. See also ‘Details’.
pos
integer specifying position in search() where to attach.
name
name to use for the attached database. Names starting with package: are reserved
for library.
warn.conflicts
logical. If TRUE, warnings are printed about conflicts from attaching the database, unless
that database contains an object .conflicts.OK. A conflict is a function masking a
function, or a non-function masking a non-function.
Details
By attaching a data frame (or list) to the search path it is possible to refer to the variables in the
data frame by their names alone, rather than as components of the data frame (e.g., in the
example below, height rather than women$height).
By default the database is attached in position 2 in the search path, immediately after the user's
workspace and before all previously attached packages and previously attached databases. This
can be altered to attach later in the search path with the pos option, but you cannot attach at pos =
1.
attach(x)
x: dataframe, matrix, list
There are 3 variables, "Expression", "Gender" and "Subtype". We can display the variables by:
>x$Gender
[1] m mmmm f m m f m m f m mmm f m mmmmm f m mm f m mmm f m mmm
>gender
Error: object 'Gender' not found
>attach(x)
>Gender
[1] m mmmm f m m f m m f m mmm f m mmmmm f m mm f m mmm f m mmm
[38] m mmmmmmmm f m f m mmmm f m m f m m f m mmm f m mmmmmmm
[75] m m f m mmmm f m mmmmmmmm f m m f m m f m f m m f m m f m m f m
[112] m f m m f m mm f m mm f m f m f fffff m f m f ff m f fff m f m f
[149] m f f m f ffff m f m f f m f f m f f m f ff m f ff m f ff m f f m f
[186] f f m f f m f m m f m f m f f m f ffff m f f m f ff m mm f m mm f f
[223] f ffff m mm f m f f m f ff m f ff m f fff m f m f fff m f ff m
[260] f f m f fffff m f f m f
fffff m f f Levels: f m
>detach(x)
>Gender
Error: object 'Gender' not found