[go: up one dir, main page]

0% found this document useful (0 votes)
19 views56 pages

Practical File

The document is a practical file on statistical programming with R, authored by Harish Kumar Chouhan for a Master's in Business Administration. It covers the introduction and installation of R and RStudio, basic operations, advanced mathematical functions, and data structures in R. The file serves as a comprehensive guide for using R for statistical analysis and data manipulation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views56 pages

Practical File

The document is a practical file on statistical programming with R, authored by Harish Kumar Chouhan for a Master's in Business Administration. It covers the introduction and installation of R and RStudio, basic operations, advanced mathematical functions, and data structures in R. The file serves as a comprehensive guide for using R for statistical analysis and data manipulation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 56

PRACTICAL FILE

ON

STATISTICAL PROGRAMMING WITH R

MASTERS OF BUSINESS ADMINISTRATION

By

HARISH KUMAR CHOUHAN

ENROLMENT NO. 22MBA00331

Under The Supervision Of

DR. TARUNIMA MISHRA

Submitted To

SCHOOL OF MANAGEMENT

SIR PADAMPAT SINGHANIA UNIVERSITY

UDAIPUR – 313601, RAJASTHAN INDIA


Practical No.1
Introduction and installation of R studio

What is R?

R is a free and open-source scripting language developed by Ross Ihaka and Robert
Gentleman in 1993. It's an alternative implementation of the S programming language,
which was widely used in the 1980s for statistical computing. The R environment is designed
to perforrm complex statistical analysis and display results using many visual graphics. The R
progamming languague is written in C, Fortran, and R itself. Most R packages are written in
the R programming language, but heavy computational chucks are written in C, C++, and
Fortran. R allows integration with Python, C, C++, .Net, and Fortran.

R is both a programming language and a software development environment. In other


words, the name R describes both the R programming language and the R software
development environment used to run R codes. R is lexically scoped. Lexical scoping is
another name for static scoping. This means the lexical structure of the program determines
the scope of a variable, not the most recently assigned variable. Here is an example of
lexical scoping in R:

x <- 5
y <- 2
multiple <- function(y) y * x
func <- function() {
x <- 3
multiple(y)
}
func()
Output:
10
The variables x <- 5 and y <- 2 are global variables, and we an use them inside and outside R
functions. However, when a variable is declared (x <- 3) inside a function (func()) and
another function (multiple()) is called inside it (func()), the declared variable (x <- 3) would
only be referred to if and only if the declared variable is an input in the called
function (multiple()). It's said that the scope of the declared variable (x <- 3) doesn't extend
into the called function (multiple()).

In the code snippet above, the func() function returns the multiple() function.
The multiple() function takes only one input parameter, y; however, it uses two
variables, x and y, to perform multiplication. It searches for the x variable globally because it
isn't an input parameter of the function. The variable y is an input parameter of
the multiple() function, so it searches for y locally first (called local reasoning). If it can't find
the y variable declared locally, it searches globally. In the code snippet above,
the multiple() function is evaluated with the global value of x, (x <-2) and the global value
of y, (y <-2). It returns a value of 10.

In the next example, the multiple() function finds the y declared locally, (y <- 10). It uses this
local value of y instead of the global one, (y <-2). Since the variable x isn't an input of
the multiple() function, it searches for its value globally (x <- 5). The function returns a value
of 50

x <- 5
y <- 2
multiple <- function(y) y * x
func <- function() {
x <- 3
y <- 10
multiple(y)
}
func()
Output:
50
You can compile and run R on the Windows, macOS X, and Linux operating systems. It also
comes with a command line interface. The > character represents the command line
prompt. Here is a simple interaction with the R command line:

> a <- 5
> b <- 6
> a * b
[1] 30
The features of the R programming languages are organized into packages. A standard
installation of the R programming languagues comes with 25 of these packages. You can
download additional packages from The Comprehensive R Archive Network (CRAN).The R
project is currently being developed and supported by the R Development Core Team.

Why Use R
R is a state-of-the-art programming languague for statistical computing, data analysis, and
machine learning. It has been around for almost three decades with over 12,000 packages
available for download on CRAN. This means that there is an R package that supports
whatever type of analysis you want to perform. Here are a few reasons why you
should learn and use R:

 Free and open-source: The R programming language is open-source and is issued under the
General Public License (GNU). This means that you can use all the functionalities of R for
free without any restrictions or licensing requirements. Since R is open-source, everyone is
welcome to contribute to the project, and since it's freely available, bugs are easily detected
and fixed by the open-source community.
 Popularity: The R programming language was ranked 7th in the 2021 IEEE Specturm
ranking of top programming languages and 12th in the TIOBE Index ranking of January 2022.
It's the second most popular programming language for data science just behind Python,
according to edX, and it is the most popular programming language for statistical analysis.
R's popularity also means that there is extensive community support on platforms like
Stackoverflow. R also has a detailed online documentation that R users can consult for help.
 High-quality visualization: The R programming languague is famous for high-quality
visualizations. R's ggplot2 is a detailed implementation of the grammar of graphics — a
system to concisely describe the components of a graph. With R's high-quality graphics, you
can easily implement intuitive and interactive graphs.
 A language for data analytics and data science: The R programming language isn't a
general-purpose programming language. It's a specialized programming language for
statistical computing. Therefore, most of R's functions carry out vectorized operations,
meaning you don't need to loop through each element. This makes running R code very fast.
Distributed computing can be executed in R, whereby tasks are split among multiple
processing computers to reduce execution time. R is integrated with Hadoop and Apache
Spark, and it can be used to process large amount of data. R can connect to all kinds of
databases, and it has packages to carry out machine learning and deep learning operations.
 Opportunity to pursue an exciting career in academe and industry: The R programming
language is trusted and extensively used in the academic community for research. R is
increasingly being used by government agencies, social media, telecommunications,
financial, e-commerce, manufacturing, and pharmaceutical companies. Top companies that
uses R include Amazon, Google, ANZ Bank, Twitter, LinkedIn, Thomas Cook, Facebook,
Accenture, Wipro, the New York Times, and many more. A good mastery of the R
programming language opens all kinds of opportunities in academe and industry.

Installing R on Windows OS
1. To install R on Windows OS:
2. Go to the CRAN website.
3. Click on "Download R for Windows".
4. Click on "install R for the first time" link to download the R executable (.exe) file.
5. Run the R executable file to start installation, and allow the app to make changes to
your device.
6. Select the installation language.

6. Follow the installation instructions


7. Click on "Finish" to exit the installation setup.

R has now been sucessfully installed on your Windows OS. Open the R GUI to start writing R codes.
Installing RStudio Desktop
To install RStudio Desktop on your computer, do the following:

1. Go to the RStudio website.


2. Click on "DOWNLOAD" in the top-right corner.
3. Click on "DOWNLOAD" under the "RStudio Open Source License".
4. Download RStudio Desktop recommended for your computer.
5. Run the RStudio Executable file (.exe) for Windows OS or the Apple Image Disk file
(.dmg) for macOS X.

6. Follow the installation instructions to complete RStudio Desktop installation.


RStudio is now successfully installed on your computer. The RStudio Desktop IDE interface is
shown in the figure below:

Another way to inteface with R using RStudio is with the RStudio Server. RStudio Server
provides a browser-based R interface.
Practical No.2
How to run R? R as a calculator, R preliminaries

How to run R?
To run R, follow these steps:
Installation:
Download R from the official R Project website.
Choose a CRAN mirror (a server) to download R for your operating system (Windows,
macOS, or Linux).
Follow the installation instructions provided on the website.

Launch R:
Once installed, open the R application on your computer. On Windows, you may find an R
icon on your desktop or in the Start menu. On macOS, look for the R icon in your
Applications folder.
R Console:
After launching R, you'll see a console window with a prompt (>). This is where you can
interact with R.
Run Commands:
In the console, you can type R commands and press Enter to execute them. For example,
you can perform basic arithmetic operations:
7+8

Press Enter, and R will calculate the result

Result
7+8

[1] 15
Scripts:
For more extended tasks or to keep a record of your work, you can write scripts. Use a text
editor (like Notepad on Windows, TextEdit on macOS, or any code editor you prefer) to
create a file with R commands, save it with a .R extension, and run the script in R.
# Example script (save as example.R)
x <- 5
y <- 3
z <- x + y
print(z)
Result
[1] 8
In the R console, use the getwd() ## get working directory
function to run the script:
getwd() ## get working directory
Result
[1] "C:/Users/DELL/Documents"
Now you've successfully installed R and run basic commands. You can explore further by
learning about R packages, data manipulation, statistical analysis, and data visualization to
leverage the full capabilities of R.

R as a Calculator:
You can use R as a simple calculator to perform arithmetic operations.
For example, you can add, subtract, multiply, and divide numbers directly in the R console.
R can perform basic arithmetic operations just like a calculator.
##Arithmetic Operations
# Addition
2+3
# Subtraction
5-2
# Multiplication
4*6
# Division
8/2
Result
[1] 5
[1] 3
[1] 24
[1] 4
##Exponentiation and Roots
# Exponentiation
3 ^ 2 # 3 squared
# Square root
sqrt(9)
Result
[1] 9
# Square root
> sqrt(9)
[1] 3
##Order of Operations
# Use parentheses to specify the order of operations
(2 + 3) * 4
Result
[1] 20

R Preliminaries:
Preliminary Concepts
Variables
Variables store values for later use.
x <- 10 # Assigns the value 10 to the variable x
y <- 5
z <- x + y # Adds the values stored in x and y, assigns the result to z
print(z)
Result
15
#Functions
#Functions perform specific tasks.
# Built-in functions
mean(c(2, 4, 6)) # Calculates the mean of the given numbers
# User-defined functions
square <- function(x) {
return(x^2)
}
square(4) # Returns the square of 4
Result
[1] 4
[1] 16
Data Types
R has various data types, including numeric, character, logical, and more.
# Numeric
num_var <- 42; num_var
# Character
char_var <- "Hello, R!" ; char_var
# Logical
logical_var <- TRUE; logical_var
Result
[1] 42
[1] "Hello, R!"
[1] TRUE
Practical No.3
Experimenting on advance math operation

R is well-suited for advanced mathematical operations. Let's explore some examples:


# Create matrices

A <- matrix(c(1, 2, 3, 4), nrow = 2, byrow = TRUE)

B <- matrix(c(5, 6, 7, 8), nrow = 2, byrow = TRUE)

# Matrix addition

C <- A + B

# Matrix multiplication

D <- A %*% t(B) # t() transposes matrix B

# Display results

print("Matrix A:")

print(A)

print("Matrix B:")

print(B)

print("Matrix Addition:")

print(C)

print("Matrix Multiplication:")

print(D)

Result
[1] "Matrix A:"
[,1] [,2]
[1,] 1 2
[2,] 3 4
[1] "Matrix B:"
[,1] [,2]
[1,] 5 6
[2,] 7 8
[1] "Matrix Addition:"
[,1] [,2]
[1,] 6 8
[2,] 10 12
[1] "Matrix Multiplication:"
[,1] [,2]
[1,] 17 23
[2,] 39 53
##Calculating Derivatives:
# Define a function
f <- function(x) x^2 + 2*x + 1
# Calculate derivative using the deriv() function
f_prime <- deriv(f(x), "x")
# Evaluate the derivative at a specific point
result <- f_prime
# Display results
print("Original function:")
print(f)
print("Derivative:")
print(f_prime)
print("Result at x = 3:")
print(result)
Result
[1] "Result at x = 3:"
expression({
.value <- 121
.grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
.grad[, "x"] <- 0
attr(.value, "gradient") <- .grad
.value
})
# Generate random data from a normal distribution

data <- rnorm(1000, mean = 0, sd = 1)

# Calculate mean and standard deviation

mean_value <- mean(data)

sd_value <- sd(data)

# Display results

print("Generated Data:")

print(head(data))

print("Mean:")

print(mean_value)

print("Standard Deviation:")

print(sd_value)

Result
[1] "Generated Data:"
[1] -0.4824830 -0.2204402 -1.5153549 1.1339153 -0.4270449
[6] -0.1222682
[1] "Mean:"
[1] 0.04735421
[1] "Standard Deviation:"
[1] 1.015164

Practical No.4
Advanced Data Structures: Data Frames, Lists Matrices
Arrays, Vctors, Factors, Strings

Advanced Data Structures:


R offers various advanced data structures, including:
Data Frames: A table-like data structure with rows and columns, commonly used for
data analysis.
# Create a data frame
student.data<- data.frame(
student_id= c(1:5),
student_name = c("abc","def","ghi","jkl","mno"),
fee= c(520.12, 600, 700,800,900),
start_date= as.Date(c("2009-12-01", "2008-05-10", "2020-07-15", "2022-08-19",
"2023-08-20"))
)
student.data
str(student.data)
m1<- data.frame(student.data$student_id,student.data$fee);m1 # Coloumn retreival
m2<- student.data[2,]; m2
m3<- student.data[2:4,]; m3
m4<- student.data[c(2,3)];m4
m5<- student.data[2,3];m5
x<-list(6,"pqr", 780.50, "2022-12-01" )
rbind(student.data,x)
y<- c("indore","jaipur", "udaipur","newai", "bhopal")
cbind(student.data, address=y)
student.data<-student.data[-2,]
student.data$fee<-NULL
print(summary(student.data))
Result
student_id student_name fee start_date
1 1 abc 520.12 2009-12-01
2 2 def 600.00 2008-05-10
3 3 ghi 700.00 2020-07-15
4 4 jkl 800.00 2022-08-19
5 5 mno 900.00 2023-08-20
'data.frame': 5 obs. of 4 variables:
$ student_id : int 1 2 3 4 5
$ student_name: chr "abc" "def" "ghi" "jkl" ...
$ fee : num 520 600 700 800 900
$ start_date : Date, format: "2009-12-01" ...
student.data.student_id student.data.fee
1 1 520.12
2 2 600.00
3 3 700.00
4 4 800.00
5 5 900.00
student_id student_name fee start_date
2 2 def 600 2008-05-10
student_id student_name fee start_date
2 2 def 600 2008-05-10
3 3 ghi 700 2020-07-15
4 4 jkl 800 2022-08-19
student_name fee
1 abc 520.12
2 def 600.00
3 ghi 700.00
4 jkl 800.00
5 mno 900.00
[1] 600
student_id student_name fee start_date
1 1 abc 520.12 2009-12-01
2 2 def 600.00 2008-05-10
3 3 ghi 700.00 2020-07-15
4 4 jkl 800.00 2022-08-19
5 5 mno 900.00 2023-08-20
6 6 pqr 780.50 2022-12-01
student_id student_name fee start_date address
1 1 abc 520.12 2009-12-01 indore
2 2 def 600.00 2008-05-10 jaipur
3 3 ghi 700.00 2020-07-15 udaipur
4 4 jkl 800.00 2022-08-19 newai
5 5 mno 900.00 2023-08-20 bhopal
student_id student_name start_date
Min. :1.00 Length:4 Min. :2009-12-01
1st Qu.:2.50 Class :character 1st Qu.:2017-11-18
Median :3.50 Mode :character Median :2021-08-01
Mean :3.25 Mean :2019-03-07
3rd Qu.:4.25 3rd Qu.:2022-11-18
Max. :5.00 Max. :2023-08-20
Lists: A versatile data structure that can hold elements of different data types.
#List
# list
z<-c(1,2,3,4,5)
y<- c("abc","def","ghi","jkl")
x<- c(T,F,T,T)
list1<-list(x,y,z);list1
List2 <- list(c("abc","def","ghi"),c(60,70,80),list("btech","Bsc","Msc"));List2
names(List2) <- c("student", "marks", "courses"); List2
# accessing List
print(List2[3]) # By index
print(List2 ["student"])
print(List2$marks)
#Unlist()
list3<- list(1:5); list3
list4<- list(6:10); list4
o1<- unlist(list3); o2<- unlist(list4)
sum <- o1+o2; sum
Result
[[1]]
[1] TRUE FALSE TRUE TRUE
[[2]]
[1] "abc" "def" "ghi" "jkl"
[[3]]
[1] 1 2 3 4 5
[[1]]
[1] "abc" "def" "ghi"
[[2]]
[1] 60 70 80
[[3]]
[[3]][[1]]
[1] "btech"
[[3]][[2]]
[1] "Bsc"
[[3]][[3]]
[1] "Msc"
$student
[1] "abc" "def" "ghi"
$marks
[1] 60 70 80
$courses
$courses[[1]]
[1] "btech"
$courses[[2]]
[1] "Bsc"
$courses[[3]]
[1] "Msc"
$courses
$courses[[1]]
[1] "btech"
$courses[[2]]
[1] "Bsc"
$courses[[3]]
[1] "Msc"
$student
[1] "abc" "def" "ghi"
[1] 60 70 80
[[1]]
[1] 1 2 3 4 5
[[1]]
[1] 6 7 8 9 10
[1] 7 9 11 13 15
Matrices: Two-dimensional arrays where all elements are of the same data type.
# Matrix is a 2D dataset (row, coloumn)
# matrix(data, nrow, ncol, byrow, dim_name)
mat <- matrix(c(2:13), nrow=4, byrow=F); mat
a<- c(2:13); length(a)
x<- matrix(c(5:16), nrow=4, byrow=T);x
x1<- matrix(c(5:16), nrow=4, byrow=F);x1
y<- matrix(c(7:18), nrow=4, byrow=F);y
row_name<- c("T1","T2","T3","T4")
col_name<- c("C1", "C2", "C3")
y<- matrix(c(7:18), nrow=4, byrow=F, dimnames= list(row_name,col_name));y
print(y[3,1]) #accessing elements
print(y[,1]);print(y[1,])
y[4,3]<-0; y
y[y==11]<-0;y
y[y<=11]<-0;y
# matrix addition
#cbind and rbind
rbind(y,c(2,3,4))
cbind(y,c(8,5,2,0))
t(y)
sum<-x+y;sum
multiply<- x*y
Result
[,1] [,2] [,3]
[1,] 2 6 10
[2,] 3 7 11
[3,] 4 8 12
[4,] 5 9 13
[1] 12
[,1] [,2] [,3]
[1,] 5 6 7
[2,] 8 9 10
[3,] 11 12 13
[4,] 14 15 16
[,1] [,2] [,3]
[1,] 5 9 13
[2,] 6 10 14
[3,] 7 11 15
[4,] 8 12 16
[,1] [,2] [,3]
[1,] 7 11 15
[2,] 8 12 16
[3,] 9 13 17
[4,] 10 14 18
C1 C2 C3
T1 7 11 15
T2 8 12 16
T3 9 13 17
T4 10 14 18
[1] 9
T1 T2 T3 T4
7 8 9 10
C1 C2 C3
7 11 15
C1 C2 C3
T1 7 11 15
T2 8 12 16
T3 9 13 17
T4 10 14 0
C1 C2 C3
T1 7 0 15
T2 8 12 16
T3 9 13 17
T4 10 14 0
C1 C2 C3
T1 0 0 15
T2 0 12 16
T3 0 13 17
T4 0 14 0
C1 C2 C3
T1 0 0 15
T2 0 12 16
T3 0 13 17
T4 0 14 0
2 3 4
C1 C2 C3
T1 0 0 15 8
T2 0 12 16 5
T3 0 13 17 2
T4 0 14 0 0
T1 T2 T3 T4
C1 0 0 0 0
C2 0 12 13 14
C3 15 16 17 0
C1 C2 C3
T1 5 6 22
T2 8 21 26
T3 11 25 30
T4 14 29 16
Arrays: Multi-dimensional data structures.
##array are the data objects which allows us to store the data in more than two
dimensions
##array_name<- array(data, dim=())
v1<- c(1,4,5)
v2<- c(10,20,30,40,50,60)
v3<- array(c(v1,v2), dim= c(3,3,4)); v3
#NAming(List)
row_name<- c("R1","R2","R3")
col_name<- c("c1","c2","c3")
mat_name<- c("m1","m2","m3","m4")
v3<- array(c(v1,v2), dim= c(3,3,4), dimnames= list(row_name, col_name, mat_name));
v3
print(v3[3,2,1])
v4<- c(1,4,5)
v5<- c(1,2,3,40,50,60)
v6<- array(c(v1,v2), dim= c(3,3,4)); v3
v7<- v3+v6; v7
Result
,1
[,1] [,2] [,3]
[1,] 1 10 40
[2,] 4 20 50
[3,] 5 30 60
,,2
[,1] [,2] [,3]
[1,] 1 10 40
[2,] 4 20 50
[3,] 5 30 60
,,3
[,1] [,2] [,3]
[1,] 1 10 40
[2,] 4 20 50
[3,] 5 30 60
4
[,1] [,2] [,3]
[1,] 1 10 40
[2,] 4 20 50
[3,] 5 30 60
, , m1
c1 c2 c3
R1 1 10 40
R2 4 20 50
R3 5 30 60
, , m2
c1 c2 c3
R1 1 10 40
R2 4 20 50
R3 5 30 60
, , m3
c1 c2 c3
R1 1 10 40
R2 4 20 50
R3 5 30 60
, , m4
c1 c2 c3
R1 1 10 40
R2 4 20 50
R3 5 30 60
[1] 30
, , m1
c1 c2 c3
R1 1 10 40
R2 4 20 50
R3 5 30 60
, , m2
c1 c2 c3
R1 1 10 40
R2 4 20 50
R3 5 30 60
, , m3
c1 c2 c3
R1 1 10 40
R2 4 20 50
R3 5 30 60
, , m4
c1 c2 c3
R1 1 10 40
R2 4 20 50
R3 5 30 60
v7<- v3+v6; v7
, , m1
c1 c2 c3
R1 2 20 80
R2 8 40 100
R3 10 60 120
, , m2
c1 c2 c3
R1 2 20 80
R2 8 40 100
R3 10 60 120
, , m3
c1 c2 c3
R1 2 20 80
R2 8 40 100
R3 10 60 120
, , m4
c1 c2 c3
R1 2 20 80
R2 8 40 100
R3 10 60 120
Vectors: One-dimensional arrays that can contain elements of the same data type.
##Vector is a collection of similar datatypes
c1<-c (1,2,3)
c2<-c (4,5,6)
print(c1+c2) # addition
print(c1-c2) # subtraction
print(c1*c2) # Multiplication
print(c1/c2) # Division
print(c1%%c2)# reminder
print(c1%/%c2)# quotient
print(c1^c2) # Power
#RElational
print(a<b) # addition
print(a>b) # subtraction
print(a==b) # Multiplication
print(a>=b) # Division
print(a<=b)# reminder
print(a!=b)# quotient
##Vector is a collection of similar datatypes
c1<-c (1,2,3)
c2<-c (4,5,6)
print(c1<c2) # addition
print(c1>c2) # subtraction
print(c1==c2) # Multiplication
print(c1>=c2) # Division
print(c1<=c2)# reminder
print(c1!=c2)# quotient
#logical & | ! && ||
print(a&b) # addition
print(a|b) # subtraction
print(!a) # Multiplication
print(a/b) # Division
print(a%%b)# reminder
print(a%/%b)# quotient
print(a^b) # Power

Result
[1] 5 7 9
[1] -3 -3 -3
[1] 4 10 18
[1] 0.25 0.40 0.50
[1] 1 2 3
[1] 0 0 0
[1] 1 32 729
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[11] FALSE FALSE
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[11] FALSE FALSE
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[11] FALSE FALSE
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[1] TRUE TRUE TRUE
[1] FALSE FALSE FALSE
[1] FALSE FALSE FALSE
[1] FALSE FALSE FALSE
[1] TRUE TRUE TRUE
[1] TRUE TRUE TRUE
[1] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
[11] TRUE TRUE
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[11] FALSE FALSE
[1] -1.000000 -3.000000 Inf 5.000000 3.000000
[6] 2.333333 2.000000 -4.500000 -10.000000 Inf
[11] 12.000000 6.500000
[1] 0 0 NA 0 0 1 0 -1 0 NA 0 1
[1] -1 -3 NA 5 3 2 2 -5 -10 NA 12 6
[1] 2.500000e-01 3.333333e-01 1.000000e+00 5.000000e+00
[5] 3.600000e+01 3.430000e+02 4.096000e+03 1.234568e-02
[9] 1.000000e-01 1.000000e+00 1.200000e+01 1.690000e+02
#data structures in R
#vectors (is a collection of similar datatypes; elements known as components)
# length() : no. of elements present; Function used is c()
a<- c(1, 2, 3, 4, 5);a
b<- -2:4;b
d<-seq(1,4, by=0.4); d
e<- seq(1,4, length.out= 3);e
#Numeric Vector
numa<- c(12.3, 14.5,15.5,16.6);numa; class(numa)
#integer
numb<- c(12L,13L,14L,15L,16L);numb; class(numb)
#Character
numc<- c("abc","def","ghi","jkl");numc; class(numc)
numc<- c("abc"=1,"def"=2,"ghi"=3,"jkl"=35)
numc["def"]
class(numc)
#vector operations
#combine
a1<- c(12,13,14,15); a1
a2<- c("abc","def","ghi","jkl"); a2
a2[2:4]
a3 <- c(a1,a2); a3
a4 <- c (12,13,14,15)
a1+a4
a1-a4
a1*a4
Result
[1] 1 2 3 4 5
[1] -2 -1 0 1 2 3 4
[1] 1.0 1.4 1.8 2.2 2.6 3.0 3.4 3.8
[1] 1.0 2.5 4.0
[1] 12.3 14.5 15.5 16.6
[1] "numeric"
[1] 12 13 14 15 16
[1] "integer"
[1] "abc" "def" "ghi" "jkl"
[1] "character"
def
2
[1] "numeric"
[1] 12 13 14 15
[1] "abc" "def" "ghi" "jkl"
[1] "def" "ghi" "jkl"
[1] "12" "13" "14" "15" "abc" "def" "ghi" "jkl"
[1] 24 26 28 30
[1] 0 0 0 0
[1] 144 169 196 225
Factors: A categorical data type used for storing and analyzing categorical variables.
# Factors : They are used to store data in Different Levels; They are used when there are
many repeated values
abs <- c("East", "West", "North", "South","East", "West", "West", "West" )
Factor_data <- factor(abs)
print(Factor_data)
Result
East West North South East West West West
Levels: East North South West
Strings: You can work with strings in R using various functions and operations.
## String : Values written inside the quotes are known as string

abc<- "Hello Students"


def <- "How"
ghi <- " are you doing"
print (paste(abc, def, ghi))
print (paste(abc, def, ghi, sep="_"))
print (paste(abc, def, ghi, sep="_", collapse = ""))
Result
[1] "Hello Students How are you doing"
[1] "Hello Students_How_ are you doing"
[1] "Hello Students_How_ are you doing"
Practical No.5
R Control Statements

R Control Statements
Control statements in R are programming constructs that allow you to control the flow
of your code. They determine which code blocks are executed and when. R provides
several control statements, including conditional statements (if-else), loops (for, while,
repeat), and branching statements (break, next).
Conditional Statements (if-else): Conditional statements in R allow you to make
decisions based on specified conditions. The basic form includes if, else if, and else. You
can use these to execute different code blocks depending on whether a condition is
met.
# IF- Else Statement
x<-25L
if (is.integer(x)){
print("x is an integer number")
} else {
print("x is not an integer")
}
#Vector (Single Condition)
y <- c("SPSU", "is", "a", "sheela","for","learning")
if ("sheela" %in% y)
{print("sheela is found in our vector" )
}else{ print ("sheela is not found in our vector")}

# Scenario 2 (Multiple Condition)


marks<- 80
if(marks>=75){
print ("First class")
}else if (marks>=65){
print ("Second class")
}else if (marks>=45){
print ("Third class")
}else {
print ("Fail")
}
Result
[1] "x is an integer number"
[1] "sheela is found in our vector"
[1] "First class"
Loops (for, while, repeat): Loops in R are used to repeat a block of code multiple times.
for loop: Iterates over a sequence of values, typically a vector.
# Looping
## For Loop
# Numeric
for (y in 1:10) {
print(paste("Number:", y ))
}
#Vector
f<- c("abc","def", "ghi", "xyz" )
for (i in f){
print(f)
}
Result
[1] "Number: 1"
[1] "Number: 2"
[1] "Number: 3"
[1] "Number: 4"
[1] "Number: 5"
[1] "Number: 6"
[1] "Number: 7"
[1] "Number: 8"
[1] "Number: 9"
[1] "Number: 10"
[1] "abc" "def" "ghi" "xyz"
[1] "abc" "def" "ghi" "xyz"
[1] "abc" "def" "ghi" "xyz"
[1] "abc" "def" "ghi" "xyz"
while loop: Repeats a block of code as long as a specified condition is TRUE.
##While Loop
i <- 1
while (i <= 5) {
print(i)
i <- i + 1
}
Result
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
repeat loop: Repeats a block of code indefinitely until a break statement is
encountered.
## Repeat Loop
a <- 1
repeat{
print("Hello SPSU")
if(a>5)
break
a<- a+1
}
Result
[1] "Hello SPSU"
[1] "Hello SPSU"
[1] "Hello SPSU"
[1] "Hello SPSU"
[1] "Hello SPSU"
[1] "Hello SPSU"
Branching Statements (break, next):
break: Terminates the current loop or switch statement and continues with the next
statement.
next: Skips the current iteration of a loop and moves to the next iteration.
## Branching Statements
#next and break
x<- 1:10
for (val in x) {
if (val==5){
next
}
print(val)
}
Result
[1] 1
[1] 2
[1] 3
[1] 4
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
Control Statements, Loops, Arithmetic, and Boolean Operators:
Control Statements: Control statements in R include if-else for making decisions based
on conditions and branching statements like break and next.
Loops: R provides different types of loops, including for, while, and repeat loops, for
executing code repeatedly.
Arithmetic Operators: R supports standard arithmetic operators (+, -, *, /, ^, %% for
modulus, %/% for integer division) for performing mathematical operations.
Boolean Operators: R provides logical operators (& for AND, | for OR, ! for NOT) to work
with logical values (TRUE and FALSE) and make logical comparisons.
#operators in R
#Arithmatic= + - / * %% %/% (quoient) ^
#RElational = < > == <= >= !=
#logical & | ! && ||
#Assignment <- = ->
#Arithmatic
a<- 9.1
b<- 10
print(a+b) # addition
print(a-b) # subtraction
print(a*b) # Multiplication
print(a/b) # Division
print(a%%b)# reminder
print(a%/%b)# quotient
print(a^b) # Power
##Vector is a collection of similar datatypes
c1<-c (1,2,3)
c2<-c (4,5,6)
print(c1+c2) # addition
print(c1-c2) # subtraction
print(c1*c2) # Multiplication
print(c1/c2) # Division
print(c1%%c2)# reminder
print(c1%/%c2)# quotient
print(c1^c2) # Power
#RElational
print(a<b) # addition
print(a>b) # subtraction
print(a==b) # Multiplication
print(a>=b) # Division
print(a<=b)# reminder
print(a!=b)# quotient
##Vector is a collection of similar datatypes
c1<-c (1,2,3)
c2<-c (4,5,6)
print(c1<c2) # addition
print(c1>c2) # subtraction
print(c1==c2) # Multiplication
print(c1>=c2) # Division
print(c1<=c2)# reminder
print(c1!=c2)# quotient
#logical & | ! && ||
print(a&b) # addition
print(a|b) # subtraction
print(!a) # Multiplication
print(a/b) # Division
print(a%%b)# reminder
print(a%/%b)# quotient
print(a^b) # Power
Result
[1] 19.1
[1] -0.9
[1] 91
[1] 0.91
[1] 9.1
[1] 0
[1] 3894161181
[1] 5 7 9
[1] -3 -3 -3
[1] 4 10 18
[1] 0.25 0.40 0.50
[1] 1 2 3
[1] 0 0 0
[1] 1 32 729
[1] TRUE
[1] FALSE
[1] FALSE
[1] FALSE
[1] TRUE
[1] TRUE
[1] TRUE TRUE TRUE
[1] FALSE FALSE FALSE
[1] FALSE FALSE FALSE
[1] FALSE FALSE FALSE
[1] TRUE TRUE TRUE
[1] TRUE TRUE TRUE
[1] TRUE
[1] TRUE
[1] FALSE
[1] 0.91
[1] 9.1
[1] 0
[1] 3894161181
Practical No.6
Input/ output (direct and importing from other spreadsheet
applications like Excel). exporting data method

Data Input and output Methods:


Direct Data Input: You can directly input data into R using the assignment operator
“ <-“ . For example, you can create vectors or data frames by manually specifying values.
Example: # Direct data input
my_vector <- c(1, 2, 3, 4, 5)
Importing Data from Excel or Other Spreadsheet Applications: You can import data from
Excel or other spreadsheet applications using packages like readxl or openxlsx. These
packages allow you to read data from Excel files and use it in your R scripts.
# Install and load the readxl package
install.packages("readxl")
library(readxl)
##Import data Files

# Import data from an Excel file


library(readxl)
Data_Sheet_MBA <- read_excel("C:/Users/DELL/Desktop/Data_Sheet_MBA.xlsx")
View(Data_Sheet_MBA)
Result
Direct Data Export: You can export data directly in R using functions like write.csv() or
write.xlsx() to save data frames to CSV or Excel format.
Example:
# Export a data frame to a CSV file
write.csv(data_frame, "data.csv")
Exporting with External Packages: R offers packages like writexl for exporting data frames to
Excel files. Install the package and use it to write data frames to Excel format.
Example (using the writexl package):
# Install and load the writexl package
install.packages("writexl")
library(writexl)
# Export a data frame to an Excel file
# Create a data frame
student.data<- data.frame(
student_id= c(1:5),
student_name = c("abc","def","ghi","jkl","mno"),
fee= c(520.12, 600, 700,800,900),
start_date= as.Date(c("2009-12-01", "2008-05-10", "2020-07-15", "2022-08-19", "2023-08-
20"))
)
student.data
## Export Data in excel . First download xlsx package
install.packages("xlsx")
install.packages("rJava")
install.packages("openxlsx")
library(openxlsx)
library(xlsx)
write.xlsx(student.data, "Data Frame.xlsx", row.names = FALSE)
getwd() ## get working directory
Result
student_id student_name fee start_date
1 1 abc 520.12 2009-12-01
2 2 def 600.00 2008-05-10
3 3 ghi 700.00 2020-07-15
4 4 jkl 800.00 2022-08-19
5 5 mno 900.00 2023-08-20
[1] "C:/Users/DELL/Documents"

Practical No.7
Built-In Math Function, Cumulative Sums And Products,
Minima And Maxima. Matrix Operation Using R

Built-in Math Functions: R provides numerous built-in math functions for various
mathematical operations. Here are a few examples:
sqrt(x): Compute the square root of x.
# sqrt(x)
# it return the square root of input x.
x<- 4
print (sqrt(x))
Result
[1] 2
log(x): Compute the natural logarithm of x.
#log(x)
# it return natural logarithm of input x.
x<- 4
print (log(x))
Result
[1] 1.386294
exp(x): Compute the exponential of x.
# exp(x)
# it return exponent.
x<- 4
print (exp(x))
Result
[1] 54.59815
abs(x): Calculate the absolute value of x.
# abs(x)
# it return the absolute value of input x.
x<- -4
print (abs(x))
Result
[1] 4
round(x, digits): Round x to a specified number of decimal places.
#round(x, digits = n)
# it return round value of input.
x<- -4
print (round(x))
x<- -4.3
print (round(x))
x<- -4.5
print (round(x))
x<- -4.6
print (round(x))
Result
[1] -4
[1] -4
[1] -4
[1] -5
trunc(x): Truncate a numeric value to an integer.
# trunc(x)
#it returns the truncate value of input x.
x<- c(1.2,2.5,8.1)
print (trunc(x))
Result
[1] 1 2 8

Cumulative Sums and Products: You can calculate cumulative sums and products
of a vector or a matrix using functions like cumsum() and cumprod().
Example:
x <- c(1, 2, 3, 4, 5)
cum_sum <- cumsum(x) # Cumulative sum
cum_prod <- cumprod(x) # Cumulative product
## Cumulative Sum and Product
# Sample data
sales <- c(100, 150, 120, 80, 200)
# Calculate the cumulative sum
cumulative_sales <- cumsum(sales)
# Display the cumulative sum
print(cumulative_sales)
Result
[1] 100 250 370 450 650
## Cumulative Product
# Sample data
population_growth_rate <- c(1.05, 1.03, 1.02, 1.04, 1.01)
# Calculate the cumulative product
cumulative_population <- cumprod(population_growth_rate)
# Display the cumulative product
print(cumulative_population)
Result
[1] 1.050000 1.081500 1.103130 1.147255 1.158728

Minima and Maxima: R provides functions to find the minimum and maximum
values in a vector or a matrix.
min(x): Find the minimum value in x.
max(x): Find the maximum value in x
## Maxima and Minima
# Sample data
data <- c(12, 5, 8, 17, 3, 21, 9)
data
# Calculate the minimum and maximum values
minimum_value <- min(data)
maximum_value <- max(data)
# Display the results
cat("Minimum Value:", minimum_value, "\n")
cat("Maximum Value:", maximum_value, "\n")
Result
[1] 12 5 8 17 3 21 9
Minimum Value: 3
Maximum Value: 21
Matrix Operations Using R: R supports various matrix operations, including matrix
multiplication, inversion, and determinant calculation. You can use the matrix() function to
create matrices and the %*% operator for matrix multiplication.
Example (matrix multiplication):
A <- matrix(c(1, 2, 3, 4), nrow = 2)
B <- matrix(c(5, 6, 7, 8), nrow = 2)
result <- A %*% B # Matrix multiplication
# the matrix function
# R wants the data to be entered by columns starting with column one
# 1st arg: c(2,3,-2,1,2,2) the values of the elements filling the columns
# 2nd arg: 3 the number of rows
# 3rd arg: 2 the number of columns
A <- matrix(c(2,3,-2,1,2,2),3,2)A
#Is Something a Matrix
is.matrix(A)
is.vector(A)
#Multiplication by a Scalar
c <- 3
c*A
#Matrix Addition & Subtraction
B <- matrix(c(1,4,-2,1,2,1),3,2)B
C <- A + B
C

D <- A - B
D
#Matrix Multiplication
D <- matrix(c(2,-2,1,2,3,1),2,3)
D
C <- D %*% A
C
C <- A %*% D
C
D <- matrix(c(2,1,3),1,3)
D
C <- D %*% A
C
C <- A %*% D
#Error in A %*% D : non-conformable arguments
#Transpose of a Matrix
AT <- t(A)
AT
ATT <- t(AT)
ATT
Result
[,1] [,2]
[1,] 2 1
[2,] 3 2
[3,] -2 2
[1] TRUE
[1] FALSE
[,1] [,2]
[1,] 6 3
[2,] 9 6
[3,] -6 6
[,1] [,2]
[1,] 1 1
[2,] 4 2
[3,] -2 1
[,1] [,2]
[1,] 3 2
[2,] 7 4
[3,] -4 3
[,1] [,2]
[1,] 1 0
[2,] -1 0
[3,] 0 1
[,1] [,2] [,3]
[1,] 2 1 3
[2,] -2 2 1
[,1] [,2]
[1,] 1 10
[2,] 0 4
[,1] [,2] [,3]
[1,] 2 4 7
[2,] 2 7 11
[3,] -8 2 -4
[,1] [,2] [,3]
[1,] 2 1 3
[,1] [,2]
[1,] 1 10
[,1] [,2] [,3]
[1,] 2 3 -2
[2,] 1 2 2
[,1] [,2]
[1,] 2 1
[2,] 3 2
[3,] -2 2

Practical No.8
Data visualization: diagrammatic representation of data

Measures of Central Tendency:


data <- c(30, 10, 25, 15, 20, 35, 40, 50, 5, 15)
# Sorting of data
sorted_data <- sort(data)
print(sorted_data) # Ascending order
Mean: The mean is the average of a dataset and is calculated by summing all values and
dividing by the total number of values.
mean_value <- mean(data);mean_value
Result
mean_value <- mean(data);mean_value
[1] 24.5
Median: The median is the middle value in a dataset when it's sorted. It's the 50th
percentile.
median_value <- median(data);median_value
Result
median_value <- median(data);median_value
[1] 22.5
Mode: The mode is the most frequently occurring value in a dataset.
# Custom mode function
custom_mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
mode_value <- custom_mode(data);mode_value
Result
[1] 15
# Create a box plot with a line for the mean, median, and mode
boxplot(data, main = "Mean, Median, and Mode Visualization", xlab = "Values", ylab =
"Data", col = "lightblue")
abline(h = mean_value, col = "red", lty = 2, lwd = 2)
abline(h = median_value, col = "green", lty = 2, lwd = 2)
abline(h = mode_value, col = "blue", lty = 2, lwd = 2)
legend("topright", legend = c("Mean", "Median", "Mode"), col = c("red", "green", "blue"), lty
= 2, lwd = 2)
# Generate a summary of the data
data_summary <- summary(data)
# Display the summary
print(data_summary)
Result

Practical No.9
Box plots, stem and leaf diagram, bar plots, pie diagram, scatter
plots

Data visualization is a crucial aspect of data analysis. It helps in presenting data in a


graphical form, making it easier to understand and draw insights. In R, you can create
various types of plots and graphs to visualize data.
Common Types of Data Visualization:

Box Plots:
Box plots, also known as box-and-whisker plots, display the distribution of a dataset's
values. They show the median, quartiles, and any outliers
# Create a box plot
boxplot(data, main = "Box Plot of Data", ylab = "Values").
# Create a box plot
boxplot(data, main = "Box Plot of Data", ylab = "Values")
stem(data)
Result

The decimal point is 1 digit(s) to the right of the |


0|5
1 | 055
2 | 05
3 | 05
4|0
5|0
Stem and Leaf Diagrams:
Stem-and-leaf plots provide a way to represent a dataset by showing the individual data
points and their distribution.
Stem-and-Leaf Plot
A stem-and-leaf plot of a quantitative variable is a textual graph that classifies data items
according to their most significant numeric digits. In addition, we often merge each
alternating row with its next row in order to simplify the graph for readability.
Example
In the data set faithful, a stem-and-leaf plot of the eruptions variable identifies durations
with the same two most significant digits, and queue them up in rows.
Problem
Find the stem-and-leaf plot of the eruption durations in faithful.
Solution
We apply the stem function to compute the stem-and-leaf plot of eruptions.

duration = faithful$eruptions
stem(duration)
Result
Answer
The stem-and-leaf plot of the eruption durations is
The decimal point is 1 digit(s) to the left of the |
16 | 070355555588
18 | 000022233333335577777777888822335777888
20 | 00002223378800035778
22 | 0002335578023578
24 | 00228
26 | 23
28 | 080
30 | 7
32 | 2337
34 | 250077
36 | 0000823577
38 | 2333335582225577
40 | 0000003357788888002233555577778
42 | 03335555778800233333555577778
44 | 02222335557780000000023333357778888
46 | 0000233357700000023578
48 | 00000022335800333
50 | 0370
Bar Plots:
Bar plots are used to represent categorical data, displaying the frequency or proportion of
each category.
# Create a bar plot
barplot(table(data), main = "Bar Plot of Data", xlab = "Values", ylab = "Frequency")
Result

Pie Charts:
Pie charts are used to represent data as a circle divided into segments, where each segment
represents a category's proportion.
# Create a pie chart
pie(table(data), main = "Pie Chart of Data")
Result

Scatter Plots:
Scatter plots display the relationship between two numerical variables, using points on a 2D
plane.
# Sample data for x and y
x <- c(1, 2, 3, 4, 5)
y <- c(10, 15, 5, 20, 25)
# Create a scatter plot
plot(x, y, main = "Scatter Plot", xlab = "X", ylab = "Y")

Result
Practical No.10
Generating Sequence Of Random Numbers

In R, you can generate sequences of random numbers with various probability distributions,
including uniform, normal, binomial, Poisson, exponential, and gamma distributions.
Additionally, you can generate non-random numbers as needed.

1. Generating Uniform Random Numbers: To generate a sequence of random


numbers from a uniform distribution, you can use the runif() function. It allows you to
specify the range of values for the uniform distribution.
Example:
# Generate 10 random numbers from uniform between 0 and 1
random_uniform <- runif(10);random_uniform
Result
[1] 0.90291602 0.49162426 0.07841426 0.53981263 0.69285577
[6] 0.32185117 0.60758166 0.26034883 0.12376122 0.87420309

2. Generating Normal Random Numbers: To generate random numbers from


a normal (Gaussian) distribution, you can use the rnorm() function. You can specify the
mean and standard deviation of the distribution.
# Generate 10 random numbers from a normal distribution with mean 0 and standard
deviation 1
random_normal <- rnorm(10, mean = 0, sd = 1)
Result
[1] -1.1777631 -0.4865430 -0.8943459 0.5205956 -1.3296384
[6] -1.1456649 -0.7065214 -0.3428695 -1.9390048 1.3340309

3. Generating Binomial Random Numbers: To generate random numbers


from a binomial distribution, you can use the rbinom() function. You need to specify the
number of trials and the probability of success.
# Generate 10 random numbers from a binomial distribution with 10 trials and a success
probability of 0.5
random_binomial <- rbinom(10, size = 10, prob = 0.5)
Result
[1] 2 6 5 6 5 3 2 3 6 4

4. Generating Poisson Random Numbers:


To generate random numbers from a Poisson distribution, you can use the rpois() function.
You need to specify the mean of the distribution.
# Generate 10 random numbers from a Poisson distribution with a mean of 2
random_poisson <- rpois(10, lambda = 2)
Result
[1] 2 3 2 0 1 1 2 4 2 1

5. Generating Exponential Random Numbers: To generate random numbers


from an exponential distribution, you can use the rexp() function. You need to specify the
rate parameter.
# Generate 10 random numbers from an exponential distribution with a rate of 0.5
random_exponential <- rexp(10, rate = 0.5)
Result
[1] 0.5804149 3.2882927 0.2809124 0.5445340 2.1332177 0.9198697
[7] 1.8914299 0.4418417 0.6761787 1.3830410

6. Generating Gamma Random Numbers:


To generate random numbers from a gamma distribution, you can use the rgamma()
function. You need to specify the shape and scale parameters
# Generate 10 random numbers from a gamma distribution with shape 2 and scale 1
random_gamma <- rgamma(10, shape = 2, scale = 1)
Result
[1] 1.0857335 3.2681058 1.6512878 4.2494426 4.9649123 2.8315611
[7] 0.2436583 5.5397528 1.6349530 0.4491038

7. Generating Non-random Numbers: If you need to generate a sequence of


non-random numbers, you can simply create a vector with the desired values.
non_random_numbers <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Result
[1] 1 2 3 4 5 6 7 8 9 10

You might also like