Unit I
Unit I
UNIT I - Chapter 1
What is R?
4. No Compilation
The R language is interpreted rather than compiled. As a result, no compiler is
required to compile code into an executable program. The R code is evaluated step
by step and turned straight into machine-level calls. This significantly reduces the
time required to run a R script.
1
UNIT - 1 2
Why Use R?
It has lots of packages. For example, the R language has more than
10,000 packages stored in the CRAN repository, and the number is
continuously increasing.
It’s great for statistics. Statistics are a big thing today, and R shines in
this regard. As a result, programmers prefer it over other languages for
statistical tool development.
2
UNIT - 1 3
It’s well suited for Machine Learning. R is ideal for machine learning
operations such as regression and classification. It even offers many
features and packages for artificial neural network development.
R Command Prompt
Once you have R environment setup, then it’s easy to start your R command
prompt by just typing the following command at your command prompt:
This will launch R interpreter and you will get a prompt > where you can start
typing your program as follows:
R Script File
Usually, you will do your programming by writing your programs in script files
and then you execute those scripts at your command prompt with the help of R
interpreter called Rscript. So let's start with writing following code in a text file
called test.R as under:
print ( myString)
Save the above code in a file test.R and execute it at Linux command prompt as
given below. Even if you are using Windows or other system, syntax will remain
same.
3
UNIT - 1 4
$ Rscript test.R
Comments
Comments are like helping text in your R program and they are ignored by the
interpreter while executing your actual program. Single comment is written using
# in the beginning of the statement as follows:
# My first program in R Programming
Installation of R
Installing R to the local computer is very easy. First, we must know which
operating system we are using so that we can download it accordingly.
If we work with R in an IDE, we can use the menu instead of the install.packages()
function to install the necessary modules from the CRAN repository. For
example, in RStudio, the most popular IDE for R, we need to complete the
following steps:
required function.
Typing help() on the R command line and pressing enter will open a window
telling you a bit on how to use the help() command
The find() function, on the other hand, is used to return a location where the
objects of a given name can be found.
library()
When we execute the above code, it produces the following result. It may vary
depending on the local settings of your pc.
Packages in library ‘C:/Program Files/R/R-3.2.2/library’:
search()
When we execute the above code, it produces the following result. It may vary
depending on the local settings of your pc.
[1] ".GlobalEnv" "package:stats" "package:graphics"
[4] "package:grDevices" "package:utils" "package:datasets"
[7] "package:methods" "Autoloads" "package:base"
To read the data from the keyboard, we use three different functions; scan(),
readline(), print().
Read Data Values: This is used for reading data into the input vector or an
input list from the environment console or file.
Keywords: File,
5
UNIT - 1 6
connection. For
example:
> #Author DataFlair
> inp = scan()
> inp
With readline(), we read multiple lines from a connection.
For example:
> str = readline()
> str
Printing to the screen: In interactive mode, one can print the value of the
variable by just typing the variable name or expression. print() function can be
used in the batch mode as:
print(x)
The argument might be an object. So it is better to use cat() instead of print(), as
the last one can print only one expression and its result is numbered, which may
be a nuisance to us. Here is an example written below:
> print("DataFlair")
> cat("DataFlair
\n") DataFlair
> int <- 24
> cat(int, "DataFlair", "Big
Data\n") 24 DataFlair Big
Data
NA
In R, the NA values are used to represent missing values. (NA stands for “not
available.”) You may encounter NA values in text loaded into R (to represent
missing values) or in data loaded from databases (to replace NULL values).
UNIT 1- Chapter 3
R - Variables
The variables can be assigned values using leftward, rightward and equal to
operator. The values of the variables can be printed
using print() or cat() function.
The cat() function combines multiple items into a continuous print output.
Eg:
# Assignment using equal operator.
var.1 = c(0,1,2,3)
# Assignment using leftward operator.
var.2 <- c("learn","R")
the ls() function. Also the ls() function can use patterns to match the variable names.
print(ls())
o/p:
[1] "my var" "my_new_var" "my_var" "var.1"
[5] "var.2" "var.3" "var.name" "var_name2."
Deleting Variables
Variables can be deleted by using the rm() function. Below we delete the variable
var.3. On printing the value of the variable error is thrown.
rm(var.3)
print(var.3)
UNIT 1 - CHAPTER 2
R - DATA TYPES
Numeric
Integer
Complex
Character
Logical
Raw Data Type
Numeric
A numeric data type is the most common type in R, and contains any number with or
without a decimal, like: 10.5, 55, 787:
Eg:
x <- 10.5
y <- 55
# Print values of x and y
8
UNIT - 1 9
print(x)
print(y)
o/p:
[1] 10.5
[1] 55
[1] "numeric"
[1] "numeric"
Integer
Integers are numeric data without decimals. This is used when you are certain that you
will never create a variable that should contain decimals. To create an integer variable,
you must use the letter L after the integer value:
Eg:
x <- 1000L
y <- 55L
o/p:
[1] 1000
[1] 55
[1] "integer"
[1] "integer
Complex
9
UNIT - 1 10
Eg:
x <- 3+5i
y <- 5i
o/p:
[1] 3+5i
[1] 0+5i
[1] "complex"
[1] "complex"
Eg:
# create a string variable
fruit <- "Apple"
print(class(fruit))
# create a character variable
my_char <- 'A'
print(class(my_char))
o/p:
10
UNIT - 1 11
Output
[1] "character"
[1] "character"
Logical:
The logical data type in R is also known as boolean data type. It can only have
2 values: TRUE or FALSE.
Eg:
x <- TRUE
print(x)
print(class(x))
y <- FALSE
print(y)
print(class(y))
o/p:
TRUE
[1] "logical"
[1] FALSE
[1] "logical"
eg:
# convert character to raw
11
UNIT - 1 12
o/p: Output
[1] 57 65 6c 63 6f 6d 65 20 74 6f 20 50 72 6f 67 72 61 6d 69 7a
[1] "raw"
[1] "Welcome to Programiz"
[1] "character"
R-objects
There are many types of R-objects. The frequently used ones are –
Vectors
Lists
Matrices
Arrays
Factors
Data Frames
Vectors
Vectors are one of the basic types of objects in R programming. Atomic vectors can
store homogeneous data types such as character, doubles, integers, raw, logical, and
complex.
A vector is a group of elements of similar data type. When you want to create vector
with more than one element, you should use c() function which means to combine the
elements into a vector.
o/p:
[1] "red" "green" "yellow"
[1] "character"
A vector is the basic data structure in R that stores data of similar types. For example,
Suppose we need to record the age of 5 employees. Instead of creating 5 separate
variables, we can simply create a vector.
Lists
A list is an R-object which can contain many different types of elements inside it like
vectors, functions and even another list inside it.
12
UNIT - 1 13
print(list1)
print(list2)
[[1]]
[1] 24
[[2]]
[1] 29
[[3]]
[1] 32
[[4]]
[1] 34
[[1]]
[1] "Ranjy"
[[2]]
[1] 38
[[3]]
[1] TRUE
Matrices
A matrix is a two-dimensional rectangular data set. It can be created using a vector
input to the matrix function.
Syntax:
matrix(vector, nrow, ncol,byrow=TRUE)
here
vector - the data items of same type
nrow- number of rows
ncol- number of columns
13
UNIT - 1 14
Eg:
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)
[,1] [,2] [,3]
[1,] "a" "a" "b"
[2,] "c" "b" "a"
Arrays
Arrays can be of any number of dimensions. The array function takes a dim attribute
which creates the required number of dimensions. In the below example we create an
array with two elements which are 3x3 matrices each.
Eg:
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)
o/p:
,,1
,,2
Factors
Factors are the r-objects which are created using a vector. It stores the vector along
with the distinct values of the elements in the vector as labels. The labels are always
character irrespective of whether it is numeric or character or Boolean etc. in the input
vector. They are useful in statistical modeling.
Factors are created using the factor() function. The nlevels functions gives the count
of levels.
Eg:
# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')
14
UNIT - 1 15
o/p:
[1] green green yellow red red red green
Levels: green red yellow
[1] 3
Data Frames
Data frames are tabular data objects. Unlike a matrix in data frame each column can
contain different modes of data. The first column can be numeric while the second
column can be character and third column can be logical. It is a list of vectors of equal
length.
Data Frames are created using the data.frame() function.
Eg:
# Create the data frame.
BMI <- data.frame(
gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age = c(42,38,26)
)
print(BMI)
o/p:
gender height weight Age
1 Male 152.0 81 42
2 Male 171.5 93 38
3 Female 165.0 78 26
15