Practical File
Practical File
ON
By
Submitted To
SCHOOL OF MANAGEMENT
What is R?
R is a free and open-source scripting language developed by Ross Ihaka and Robert
Gentleman in 1993. It's an alternative implementation of the S programming language,
which was widely used in the 1980s for statistical computing. The R environment is designed
to perforrm complex statistical analysis and display results using many visual graphics. The R
progamming languague is written in C, Fortran, and R itself. Most R packages are written in
the R programming language, but heavy computational chucks are written in C, C++, and
Fortran. R allows integration with Python, C, C++, .Net, and Fortran.
x <- 5
y <- 2
multiple <- function(y) y * x
func <- function() {
x <- 3
multiple(y)
}
func()
Output:
10
The variables x <- 5 and y <- 2 are global variables, and we an use them inside and outside R
functions. However, when a variable is declared (x <- 3) inside a function (func()) and
another function (multiple()) is called inside it (func()), the declared variable (x <- 3) would
only be referred to if and only if the declared variable is an input in the called
function (multiple()). It's said that the scope of the declared variable (x <- 3) doesn't extend
into the called function (multiple()).
In the code snippet above, the func() function returns the multiple() function.
The multiple() function takes only one input parameter, y; however, it uses two
variables, x and y, to perform multiplication. It searches for the x variable globally because it
isn't an input parameter of the function. The variable y is an input parameter of
the multiple() function, so it searches for y locally first (called local reasoning). If it can't find
the y variable declared locally, it searches globally. In the code snippet above,
the multiple() function is evaluated with the global value of x, (x <-2) and the global value
of y, (y <-2). It returns a value of 10.
In the next example, the multiple() function finds the y declared locally, (y <- 10). It uses this
local value of y instead of the global one, (y <-2). Since the variable x isn't an input of
the multiple() function, it searches for its value globally (x <- 5). The function returns a value
of 50
x <- 5
y <- 2
multiple <- function(y) y * x
func <- function() {
x <- 3
y <- 10
multiple(y)
}
func()
Output:
50
You can compile and run R on the Windows, macOS X, and Linux operating systems. It also
comes with a command line interface. The > character represents the command line
prompt. Here is a simple interaction with the R command line:
> a <- 5
> b <- 6
> a * b
[1] 30
The features of the R programming languages are organized into packages. A standard
installation of the R programming languagues comes with 25 of these packages. You can
download additional packages from The Comprehensive R Archive Network (CRAN).The R
project is currently being developed and supported by the R Development Core Team.
Why Use R
R is a state-of-the-art programming languague for statistical computing, data analysis, and
machine learning. It has been around for almost three decades with over 12,000 packages
available for download on CRAN. This means that there is an R package that supports
whatever type of analysis you want to perform. Here are a few reasons why you
should learn and use R:
Free and open-source: The R programming language is open-source and is issued under the
General Public License (GNU). This means that you can use all the functionalities of R for
free without any restrictions or licensing requirements. Since R is open-source, everyone is
welcome to contribute to the project, and since it's freely available, bugs are easily detected
and fixed by the open-source community.
Popularity: The R programming language was ranked 7th in the 2021 IEEE Specturm
ranking of top programming languages and 12th in the TIOBE Index ranking of January 2022.
It's the second most popular programming language for data science just behind Python,
according to edX, and it is the most popular programming language for statistical analysis.
R's popularity also means that there is extensive community support on platforms like
Stackoverflow. R also has a detailed online documentation that R users can consult for help.
High-quality visualization: The R programming languague is famous for high-quality
visualizations. R's ggplot2 is a detailed implementation of the grammar of graphics — a
system to concisely describe the components of a graph. With R's high-quality graphics, you
can easily implement intuitive and interactive graphs.
A language for data analytics and data science: The R programming language isn't a
general-purpose programming language. It's a specialized programming language for
statistical computing. Therefore, most of R's functions carry out vectorized operations,
meaning you don't need to loop through each element. This makes running R code very fast.
Distributed computing can be executed in R, whereby tasks are split among multiple
processing computers to reduce execution time. R is integrated with Hadoop and Apache
Spark, and it can be used to process large amount of data. R can connect to all kinds of
databases, and it has packages to carry out machine learning and deep learning operations.
Opportunity to pursue an exciting career in academe and industry: The R programming
language is trusted and extensively used in the academic community for research. R is
increasingly being used by government agencies, social media, telecommunications,
financial, e-commerce, manufacturing, and pharmaceutical companies. Top companies that
uses R include Amazon, Google, ANZ Bank, Twitter, LinkedIn, Thomas Cook, Facebook,
Accenture, Wipro, the New York Times, and many more. A good mastery of the R
programming language opens all kinds of opportunities in academe and industry.
Installing R on Windows OS
1. To install R on Windows OS:
2. Go to the CRAN website.
3. Click on "Download R for Windows".
4. Click on "install R for the first time" link to download the R executable (.exe) file.
5. Run the R executable file to start installation, and allow the app to make changes to
your device.
6. Select the installation language.
R has now been sucessfully installed on your Windows OS. Open the R GUI to start writing R codes.
Installing RStudio Desktop
To install RStudio Desktop on your computer, do the following:
Another way to inteface with R using RStudio is with the RStudio Server. RStudio Server
provides a browser-based R interface.
Practical No.2
How to run R? R as a calculator, R preliminaries
How to run R?
To run R, follow these steps:
Installation:
Download R from the official R Project website.
Choose a CRAN mirror (a server) to download R for your operating system (Windows,
macOS, or Linux).
Follow the installation instructions provided on the website.
Launch R:
Once installed, open the R application on your computer. On Windows, you may find an R
icon on your desktop or in the Start menu. On macOS, look for the R icon in your
Applications folder.
R Console:
After launching R, you'll see a console window with a prompt (>). This is where you can
interact with R.
Run Commands:
In the console, you can type R commands and press Enter to execute them. For example,
you can perform basic arithmetic operations:
7+8
Result
7+8
[1] 15
Scripts:
For more extended tasks or to keep a record of your work, you can write scripts. Use a text
editor (like Notepad on Windows, TextEdit on macOS, or any code editor you prefer) to
create a file with R commands, save it with a .R extension, and run the script in R.
# Example script (save as example.R)
x <- 5
y <- 3
z <- x + y
print(z)
Result
[1] 8
In the R console, use the getwd() ## get working directory
function to run the script:
getwd() ## get working directory
Result
[1] "C:/Users/DELL/Documents"
Now you've successfully installed R and run basic commands. You can explore further by
learning about R packages, data manipulation, statistical analysis, and data visualization to
leverage the full capabilities of R.
R as a Calculator:
You can use R as a simple calculator to perform arithmetic operations.
For example, you can add, subtract, multiply, and divide numbers directly in the R console.
R can perform basic arithmetic operations just like a calculator.
##Arithmetic Operations
# Addition
2+3
# Subtraction
5-2
# Multiplication
4*6
# Division
8/2
Result
[1] 5
[1] 3
[1] 24
[1] 4
##Exponentiation and Roots
# Exponentiation
3 ^ 2 # 3 squared
# Square root
sqrt(9)
Result
[1] 9
# Square root
> sqrt(9)
[1] 3
##Order of Operations
# Use parentheses to specify the order of operations
(2 + 3) * 4
Result
[1] 20
R Preliminaries:
Preliminary Concepts
Variables
Variables store values for later use.
x <- 10 # Assigns the value 10 to the variable x
y <- 5
z <- x + y # Adds the values stored in x and y, assigns the result to z
print(z)
Result
15
#Functions
#Functions perform specific tasks.
# Built-in functions
mean(c(2, 4, 6)) # Calculates the mean of the given numbers
# User-defined functions
square <- function(x) {
return(x^2)
}
square(4) # Returns the square of 4
Result
[1] 4
[1] 16
Data Types
R has various data types, including numeric, character, logical, and more.
# Numeric
num_var <- 42; num_var
# Character
char_var <- "Hello, R!" ; char_var
# Logical
logical_var <- TRUE; logical_var
Result
[1] 42
[1] "Hello, R!"
[1] TRUE
Practical No.3
Experimenting on advance math operation
# Matrix addition
C <- A + B
# Matrix multiplication
# Display results
print("Matrix A:")
print(A)
print("Matrix B:")
print(B)
print("Matrix Addition:")
print(C)
print("Matrix Multiplication:")
print(D)
Result
[1] "Matrix A:"
[,1] [,2]
[1,] 1 2
[2,] 3 4
[1] "Matrix B:"
[,1] [,2]
[1,] 5 6
[2,] 7 8
[1] "Matrix Addition:"
[,1] [,2]
[1,] 6 8
[2,] 10 12
[1] "Matrix Multiplication:"
[,1] [,2]
[1,] 17 23
[2,] 39 53
##Calculating Derivatives:
# Define a function
f <- function(x) x^2 + 2*x + 1
# Calculate derivative using the deriv() function
f_prime <- deriv(f(x), "x")
# Evaluate the derivative at a specific point
result <- f_prime
# Display results
print("Original function:")
print(f)
print("Derivative:")
print(f_prime)
print("Result at x = 3:")
print(result)
Result
[1] "Result at x = 3:"
expression({
.value <- 121
.grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
.grad[, "x"] <- 0
attr(.value, "gradient") <- .grad
.value
})
# Generate random data from a normal distribution
# Display results
print("Generated Data:")
print(head(data))
print("Mean:")
print(mean_value)
print("Standard Deviation:")
print(sd_value)
Result
[1] "Generated Data:"
[1] -0.4824830 -0.2204402 -1.5153549 1.1339153 -0.4270449
[6] -0.1222682
[1] "Mean:"
[1] 0.04735421
[1] "Standard Deviation:"
[1] 1.015164
Practical No.4
Advanced Data Structures: Data Frames, Lists Matrices
Arrays, Vctors, Factors, Strings
Result
[1] 5 7 9
[1] -3 -3 -3
[1] 4 10 18
[1] 0.25 0.40 0.50
[1] 1 2 3
[1] 0 0 0
[1] 1 32 729
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[11] FALSE FALSE
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[11] FALSE FALSE
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[11] FALSE FALSE
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[1] TRUE TRUE TRUE
[1] FALSE FALSE FALSE
[1] FALSE FALSE FALSE
[1] FALSE FALSE FALSE
[1] TRUE TRUE TRUE
[1] TRUE TRUE TRUE
[1] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
[11] TRUE TRUE
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[11] FALSE FALSE
[1] -1.000000 -3.000000 Inf 5.000000 3.000000
[6] 2.333333 2.000000 -4.500000 -10.000000 Inf
[11] 12.000000 6.500000
[1] 0 0 NA 0 0 1 0 -1 0 NA 0 1
[1] -1 -3 NA 5 3 2 2 -5 -10 NA 12 6
[1] 2.500000e-01 3.333333e-01 1.000000e+00 5.000000e+00
[5] 3.600000e+01 3.430000e+02 4.096000e+03 1.234568e-02
[9] 1.000000e-01 1.000000e+00 1.200000e+01 1.690000e+02
#data structures in R
#vectors (is a collection of similar datatypes; elements known as components)
# length() : no. of elements present; Function used is c()
a<- c(1, 2, 3, 4, 5);a
b<- -2:4;b
d<-seq(1,4, by=0.4); d
e<- seq(1,4, length.out= 3);e
#Numeric Vector
numa<- c(12.3, 14.5,15.5,16.6);numa; class(numa)
#integer
numb<- c(12L,13L,14L,15L,16L);numb; class(numb)
#Character
numc<- c("abc","def","ghi","jkl");numc; class(numc)
numc<- c("abc"=1,"def"=2,"ghi"=3,"jkl"=35)
numc["def"]
class(numc)
#vector operations
#combine
a1<- c(12,13,14,15); a1
a2<- c("abc","def","ghi","jkl"); a2
a2[2:4]
a3 <- c(a1,a2); a3
a4 <- c (12,13,14,15)
a1+a4
a1-a4
a1*a4
Result
[1] 1 2 3 4 5
[1] -2 -1 0 1 2 3 4
[1] 1.0 1.4 1.8 2.2 2.6 3.0 3.4 3.8
[1] 1.0 2.5 4.0
[1] 12.3 14.5 15.5 16.6
[1] "numeric"
[1] 12 13 14 15 16
[1] "integer"
[1] "abc" "def" "ghi" "jkl"
[1] "character"
def
2
[1] "numeric"
[1] 12 13 14 15
[1] "abc" "def" "ghi" "jkl"
[1] "def" "ghi" "jkl"
[1] "12" "13" "14" "15" "abc" "def" "ghi" "jkl"
[1] 24 26 28 30
[1] 0 0 0 0
[1] 144 169 196 225
Factors: A categorical data type used for storing and analyzing categorical variables.
# Factors : They are used to store data in Different Levels; They are used when there are
many repeated values
abs <- c("East", "West", "North", "South","East", "West", "West", "West" )
Factor_data <- factor(abs)
print(Factor_data)
Result
East West North South East West West West
Levels: East North South West
Strings: You can work with strings in R using various functions and operations.
## String : Values written inside the quotes are known as string
R Control Statements
Control statements in R are programming constructs that allow you to control the flow
of your code. They determine which code blocks are executed and when. R provides
several control statements, including conditional statements (if-else), loops (for, while,
repeat), and branching statements (break, next).
Conditional Statements (if-else): Conditional statements in R allow you to make
decisions based on specified conditions. The basic form includes if, else if, and else. You
can use these to execute different code blocks depending on whether a condition is
met.
# IF- Else Statement
x<-25L
if (is.integer(x)){
print("x is an integer number")
} else {
print("x is not an integer")
}
#Vector (Single Condition)
y <- c("SPSU", "is", "a", "sheela","for","learning")
if ("sheela" %in% y)
{print("sheela is found in our vector" )
}else{ print ("sheela is not found in our vector")}
Practical No.7
Built-In Math Function, Cumulative Sums And Products,
Minima And Maxima. Matrix Operation Using R
Built-in Math Functions: R provides numerous built-in math functions for various
mathematical operations. Here are a few examples:
sqrt(x): Compute the square root of x.
# sqrt(x)
# it return the square root of input x.
x<- 4
print (sqrt(x))
Result
[1] 2
log(x): Compute the natural logarithm of x.
#log(x)
# it return natural logarithm of input x.
x<- 4
print (log(x))
Result
[1] 1.386294
exp(x): Compute the exponential of x.
# exp(x)
# it return exponent.
x<- 4
print (exp(x))
Result
[1] 54.59815
abs(x): Calculate the absolute value of x.
# abs(x)
# it return the absolute value of input x.
x<- -4
print (abs(x))
Result
[1] 4
round(x, digits): Round x to a specified number of decimal places.
#round(x, digits = n)
# it return round value of input.
x<- -4
print (round(x))
x<- -4.3
print (round(x))
x<- -4.5
print (round(x))
x<- -4.6
print (round(x))
Result
[1] -4
[1] -4
[1] -4
[1] -5
trunc(x): Truncate a numeric value to an integer.
# trunc(x)
#it returns the truncate value of input x.
x<- c(1.2,2.5,8.1)
print (trunc(x))
Result
[1] 1 2 8
Cumulative Sums and Products: You can calculate cumulative sums and products
of a vector or a matrix using functions like cumsum() and cumprod().
Example:
x <- c(1, 2, 3, 4, 5)
cum_sum <- cumsum(x) # Cumulative sum
cum_prod <- cumprod(x) # Cumulative product
## Cumulative Sum and Product
# Sample data
sales <- c(100, 150, 120, 80, 200)
# Calculate the cumulative sum
cumulative_sales <- cumsum(sales)
# Display the cumulative sum
print(cumulative_sales)
Result
[1] 100 250 370 450 650
## Cumulative Product
# Sample data
population_growth_rate <- c(1.05, 1.03, 1.02, 1.04, 1.01)
# Calculate the cumulative product
cumulative_population <- cumprod(population_growth_rate)
# Display the cumulative product
print(cumulative_population)
Result
[1] 1.050000 1.081500 1.103130 1.147255 1.158728
Minima and Maxima: R provides functions to find the minimum and maximum
values in a vector or a matrix.
min(x): Find the minimum value in x.
max(x): Find the maximum value in x
## Maxima and Minima
# Sample data
data <- c(12, 5, 8, 17, 3, 21, 9)
data
# Calculate the minimum and maximum values
minimum_value <- min(data)
maximum_value <- max(data)
# Display the results
cat("Minimum Value:", minimum_value, "\n")
cat("Maximum Value:", maximum_value, "\n")
Result
[1] 12 5 8 17 3 21 9
Minimum Value: 3
Maximum Value: 21
Matrix Operations Using R: R supports various matrix operations, including matrix
multiplication, inversion, and determinant calculation. You can use the matrix() function to
create matrices and the %*% operator for matrix multiplication.
Example (matrix multiplication):
A <- matrix(c(1, 2, 3, 4), nrow = 2)
B <- matrix(c(5, 6, 7, 8), nrow = 2)
result <- A %*% B # Matrix multiplication
# the matrix function
# R wants the data to be entered by columns starting with column one
# 1st arg: c(2,3,-2,1,2,2) the values of the elements filling the columns
# 2nd arg: 3 the number of rows
# 3rd arg: 2 the number of columns
A <- matrix(c(2,3,-2,1,2,2),3,2)A
#Is Something a Matrix
is.matrix(A)
is.vector(A)
#Multiplication by a Scalar
c <- 3
c*A
#Matrix Addition & Subtraction
B <- matrix(c(1,4,-2,1,2,1),3,2)B
C <- A + B
C
D <- A - B
D
#Matrix Multiplication
D <- matrix(c(2,-2,1,2,3,1),2,3)
D
C <- D %*% A
C
C <- A %*% D
C
D <- matrix(c(2,1,3),1,3)
D
C <- D %*% A
C
C <- A %*% D
#Error in A %*% D : non-conformable arguments
#Transpose of a Matrix
AT <- t(A)
AT
ATT <- t(AT)
ATT
Result
[,1] [,2]
[1,] 2 1
[2,] 3 2
[3,] -2 2
[1] TRUE
[1] FALSE
[,1] [,2]
[1,] 6 3
[2,] 9 6
[3,] -6 6
[,1] [,2]
[1,] 1 1
[2,] 4 2
[3,] -2 1
[,1] [,2]
[1,] 3 2
[2,] 7 4
[3,] -4 3
[,1] [,2]
[1,] 1 0
[2,] -1 0
[3,] 0 1
[,1] [,2] [,3]
[1,] 2 1 3
[2,] -2 2 1
[,1] [,2]
[1,] 1 10
[2,] 0 4
[,1] [,2] [,3]
[1,] 2 4 7
[2,] 2 7 11
[3,] -8 2 -4
[,1] [,2] [,3]
[1,] 2 1 3
[,1] [,2]
[1,] 1 10
[,1] [,2] [,3]
[1,] 2 3 -2
[2,] 1 2 2
[,1] [,2]
[1,] 2 1
[2,] 3 2
[3,] -2 2
Practical No.8
Data visualization: diagrammatic representation of data
Practical No.9
Box plots, stem and leaf diagram, bar plots, pie diagram, scatter
plots
Box Plots:
Box plots, also known as box-and-whisker plots, display the distribution of a dataset's
values. They show the median, quartiles, and any outliers
# Create a box plot
boxplot(data, main = "Box Plot of Data", ylab = "Values").
# Create a box plot
boxplot(data, main = "Box Plot of Data", ylab = "Values")
stem(data)
Result
duration = faithful$eruptions
stem(duration)
Result
Answer
The stem-and-leaf plot of the eruption durations is
The decimal point is 1 digit(s) to the left of the |
16 | 070355555588
18 | 000022233333335577777777888822335777888
20 | 00002223378800035778
22 | 0002335578023578
24 | 00228
26 | 23
28 | 080
30 | 7
32 | 2337
34 | 250077
36 | 0000823577
38 | 2333335582225577
40 | 0000003357788888002233555577778
42 | 03335555778800233333555577778
44 | 02222335557780000000023333357778888
46 | 0000233357700000023578
48 | 00000022335800333
50 | 0370
Bar Plots:
Bar plots are used to represent categorical data, displaying the frequency or proportion of
each category.
# Create a bar plot
barplot(table(data), main = "Bar Plot of Data", xlab = "Values", ylab = "Frequency")
Result
Pie Charts:
Pie charts are used to represent data as a circle divided into segments, where each segment
represents a category's proportion.
# Create a pie chart
pie(table(data), main = "Pie Chart of Data")
Result
Scatter Plots:
Scatter plots display the relationship between two numerical variables, using points on a 2D
plane.
# Sample data for x and y
x <- c(1, 2, 3, 4, 5)
y <- c(10, 15, 5, 20, 25)
# Create a scatter plot
plot(x, y, main = "Scatter Plot", xlab = "X", ylab = "Y")
Result
Practical No.10
Generating Sequence Of Random Numbers
In R, you can generate sequences of random numbers with various probability distributions,
including uniform, normal, binomial, Poisson, exponential, and gamma distributions.
Additionally, you can generate non-random numbers as needed.