R Prog Lab Manual Theory
R Prog Lab Manual Theory
1
Title Installation of R and RStudio
Objective: To learn features of R programming and its installation.
Theory: To install R and RStudio on your computer, you need to follow two separate steps:
1. Installing R
R is the underlying software that you'll use to perform statistical computing and graphics. To install
R:
Step-by-Step Guide for Installing R:
1. Download R:
○ R can be installed from your Linux distribution’s package manager, or you can
follow the instructions on the CRAN website for a specific Linux distro.
5. Install R:
○ After downloading, run the installer and follow the on-screen instructions.
○ If you’re on Windows, you can usually accept the default installation options.
6. Test R Installation:
○ After downloading, open the installer and follow the installation prompts.
○ On Windows or macOS, the installation should be straightforward. For Linux, use
the relevant package manager to install it, or you can follow detailed installation
steps provided on the RStudio website.
3. Test RStudio Installation:
Experiment No. 2
Title Data types, mathematical operators and functions in R.
Objective: To study variable, data types, literals in R
Theory: In R, data types, mathematical operators, and functions form the core of programming and data
analysis. Understanding these will help you manipulate and analyze data effectivelyEach R data type
has its own set of regulations and restrictions.
Variables are not needed to be declare with a data type in R, data type even can be changed.
Example of R data Type:
#numeric data type
var <- 30
#integer data type
var <-80L
Data Types in R Programming Language
Each variable in R has an associated data type. Each R-Data Type requires different amounts of
memory and has some specific operations which can be performed over it.
Data Types in R are:
1. numeric – (3,6.7,121)
2. Integer – (2L, 42L; where ‘L’ declares this as an integer)
3. logical – (‘True’)
4. complex – (7 + 5i; where ‘i’ is imaginary number)
5. character – (“a”, “B”, “c is third”, “69”)
6. raw – ( as.raw(55); raw creates a raw vector of the specified length)
R Programming language has the following basic R-data types and the following table shows the
data type and the values that each data type can take.
Basic Data Types Values Examples
Character “a”, “b”, “c”, …, “@”, “#”, "character_value <- "Hello Geeks"
“$”, …., “1”, “2”, …etc
R Operators
R Operators are the symbols directing the compiler to perform various kinds of operations between
the operands. Operators simulate the various mathematical, logical, and decision operations
performed on a set of Complex Numbers, Integers, and Numericals as input operands.
R supports majorly four kinds of binary operators between a set of operands. In this article, we will
see various types of operators in R Programming language and their usage.
Types of the operator in R language
● Arithmetic Operators
● Logical Operators
● Relational Operators
● Assignment Operators
● Miscellaneous Operators
Functions in R Programming
A function accepts input arguments and produces the output by executing valid R commands that are
inside the function.
Functions are useful when you want to perform a certain task multiple times.
In R Programming Language when you are creating a function the function name and the file in
which you are creating the function need not be the same and you can have one or more functions in
R.
Creating a Function in R Programming
Functions are created in R by using the command function(). The general structure of the function
file is as follows:
Conclusion: ● Data Types in R include numeric, integer, character, logical, complex, factor, and others like
Date, Time, List, and Data Frames.
● Mathematical Operators such as +, -, *, /, %%, ^, and comparison operators (==, <, >, etc.)
are fundamental to performing calculations and comparisons.
● Functions in R (like sum(), mean(), length(), paste(), etc.) are used to perform common
operations like aggregation, string manipulation, and data handling.
These tools form the foundation for working with data in R, so understanding them is essential for
any analysis or experiment.
Experiment No. 3
Title Vectors, Factors, Lists, Matrix, Data Frames in R.
Objective: To study python Keywords and Operators.
Theory: One of the key features of R is that it can handle complex statistical operations in an easy and optimized
way.
R handles complex computations using:
● Vector – A basic data structure of R containing the same type of data
● Matrices – A matrix is a rectangular array of numbers or other mathematical objects. We can do
operations such as addition and multiplication on Matrix in R.
● Lists – Lists store collections of objects when vectors are of same type and length in a matrix.
● Data Frames – Generated by combining together multiple vectors such that each vector becomes
a separate column.
Vectors in R
In R, Vector is a basic data structure in R that contains element of similar type. These data types in R can
be logical, integer, double, character, complex or raw.
In R using the function, typeof() one can check the data type of vector.
One more significant property of R vector is its length. The function length() determines the number of
elements in the vector.
1. > c(2, 3, 5) [1] 2 3 5
2. [1] 2 3 5
3. > length(c("aa", "bb", "cc", "dd", "ee"))
4. [1] 5
Indexing Vectors in R
To access individual elements and subsets in R, we use indexing technique. It can access, extract and
replace part of an object.
It is possible to print or refer to the subset of a vector by appending an index vector, enclosed in square
brackets, to the vector name.
There are four forms of vector indexing used to extract a subset of vectors:
● Positive integers Vector– Indicates selected elements
● Negative integers Vector – Indicates rejected elements
● Character strings Vector – Used only for vectors with named elements
● Logical values Vector– They are the result of evaluated conditions.
Experiment No. 4
Title Measurement of Central Tendency Mean, Median and Mode.
Objective: To study Python Oprators ( Bitwise, Identity, Membership).
Theory: Mean, median, and mode are the three most common measures of central tendency, which are numerical
values that summarize the middle of a data set:
Measure Definition
Mean The average of a data set, calculated by adding all values and dividing by the total
number of values
Median The middle number in an ordered data set, where there are an equal number of values
higher and lower than it
Conclusion: I learn
Experiment No. 5
Title Measurement of Variation - Range, IQR and Standard Deviation.
Objective: To study the if-else statement and indentation
Theory: Range, interquartile range (IQR), and standard deviation are all measures of how spread out a data set is,
or its variability:
Range: The difference between the largest and smallest values in a data set.
Interquartile range (IQR): The distance between the first and third quartiles, which are the 25th and 75th
percentiles. IQR measures the spread of the middle 50% of the data.
Standard deviation: A measure of how far individual data points are from the mean of the data set. It's
calculated as the square root of the variance.
Here are some more details about these measures:
Variance: A measure of how spread out the data is, with larger variance values indicating more
significant dispersion. Variance is the average of squared deviations from the mean.
Standard deviation formula: The formula for standard deviation is the square root of the sum of squared
differences from the mean divided by the size of the data set.
IQR and median: IQR is used in conjunction with the median, which is the 50th percentile.
Conclusion: I learn Measurement of Central Tendency Mean, Median and Mode
Experiment No. 6
Title Descriptive Statistics Using psych Package.
Objective: To study and demonstrate the Loop statements in Python
Theory: psych Package in R
The “psych” package is an R package that provides various functions for psychological research and
data analysis. It contains tools for data visualization, factor analysis, reliability analysis, correlation
analysis, and more. In this article, we will discuss the basics of using the “psych” package in R
Programming Language.
Introduction to Psych Package
Before we proceed to the steps, it is important to understand some key concepts related to the
“psych” package:
1. Factor Analysis: This is a statistical technique used to identify underlying factors or
dimensions in a set of observed variables. The “psych” package provides various functions
for performing factor analysis, including “principal()”, “fa()”, and “fa.parallel()”.
2. Reliability Analysis: This involves testing the consistency and stability of a set of
measurements or items. The “psych” package provides functions for calculating various
types of reliability coefficients, including Cronbach’s alpha, Guttman’s lambda, and more.
3. Principal Component Analysis: This is another statistical technique used to identify
underlying dimensions in a set of variables. The psych package includes functions for
performing PCA and visualizing the results.
4. Cluster Analysis: This is a technique used to group objects or individuals based on their
similarities or differences. The psych package includes functions for hierarchical clustering,
k-means clustering, and other types of cluster analysis.
5. Correlation Analysis: This involves examining the relationship between two or more
variables. The “psych” package provides functions for calculating various types of
correlation coefficients, including Pearson’s r, Spearman’s rho, and Kendall’s tau.
Conclusion: I learn Descriptive Statistics Using psych Package.
Experiment No. 7
Title One & two Sample z Test Using R
Objective: To study and apply the Function in Python.
Theory: You can use the z.test() function from the BSDA package to perform one sample and two sample
z-tests in R.
This function uses the following basic syntax:
z.test(x, y, alternative='two.sided', mu=0, sigma.x=NULL, sigma.y=NULL,conf.level=.95)
where:
● x: values for the first sample
● y: values for the second sample (if performing a two sample z-test)
● alternative: the alternative hypothesis (‘greater’, ‘less’, ‘two.sided’)
● mu: mean under the null or mean difference (in two sample case)
● sigma.x: population standard deviation of first sample
● sigma.y: population standard deviation of second sample
● conf.level: confidence level to use
Example 1: One Sample Z-Test in R
Suppose the IQ in a certain population is normally distributed with a mean of μ = 100 and standard
deviation of σ = 15.
A scientist wants to know if a new medication affects IQ levels, so she recruits 20 patients to use it
for one month and records their IQ levels at the end of the month.
Example 2: Two Sample Z-Test in R
Suppose the IQ levels among individuals in two different cities are known to be normally distributed
each with population standard deviations of 15.
A scientist wants to know if the mean IQ level between individuals in city A and city B are different,
so she selects a simple random sample of 20 individuals from each city and records their IQ levels.
Experiment No. 10
Title Contingency Table Using R
Objective: To study and Demonstrate the use of Sets and Dictionary in Python.
Theory: Contingency tables are very useful to condense a large number of observations into smaller to make
it easier to maintain tables. A contingency table shows the distribution of a variable in the rows and
another in its columns. Contingency tables are not only useful for condensing data, but they also
show the relations between variables. They are a way of summarizing categorical variables. A
contingency table that deals with a single table are called a complex or a flat contingency table.
Making Contingency tables
A contingency table is a way to redraw data and assemble it into a table. And, it shows the layout of
the original data in a manner that allows the reader to gain an overall summary of the original data.
The table() function is used in R to create a contingency table. The table() function is one of the
most versatile functions in R. It can take any data structure as an argument and turn it into a table.
The more complex the original data, the more complex is the resulting contingency table.
Creating contingency tables from Vectors
In R a vector is an ordered collection of basic data types of a given length. The only key thing here is
all the elements of a vector must be of the identical data type e.g homogeneous data structures.
Vectors are one-dimensional data structures. It is the simplest data object from which you can create
a contingency table.
Example:
# R program to illustrate
# Contingency Table
# Creating a vector
vec = c(2, 4, 3, 1, 6, 3, 2, 1, 4, 5)
Experiment No. 11
Title Analysis of Variance (ANOVA) Using R
Objective: To study and Demonstrate the use of Lists in Python.
Theory: ANOVA (ANalysis Of VAriance) is a statistical test to determine whether two or more population
means are different. In other words, it is used to compare two or more groups to see if they are
significantly different.
In practice, however, the:
● Student t-test is used to compare 2 groups;
● ANOVA generalizes the t-test beyond 2 groups, so it is used to compare 3 or more groups.
Note that there are several versions of the ANOVA (e.g., one-way ANOVA, two-way ANOVA,
mixed ANOVA, repeated measures ANOVA, etc.). In this article, we present the simplest form
only—the one-way ANOVA1—and we refer to it as ANOVA in the remaining of the article.
Although ANOVA is used to make inference about means of different groups, the method is called
“analysis of variance”. It is called like this because it compares the “between” variance (the variance
between the different groups) and the variance “within” (the variance within each group). If the
between variance is significantly larger than the within variance, the group means are declared to be
different. Otherwise, we cannot conclude one way or the other. The two variances are compared to
each other by taking the ratio (variancebetweenvariancewithin
) and then by comparing this ratio to a threshold from the Fisher probability distribution (a threshold
based on a specific significance level, usually 5%).
This is enough theory regarding the ANOVA method for now. In the remaining of this article, we
discuss about it from a more practical point of view, and in particular we will cover the following
points:
● the aim of the ANOVA, when it should be used and the null/alternative hypothesis
● the underlying assumptions of the ANOVA and how to check them
● how to perform the ANOVA in R
● how to interpret results of the ANOVA
● understand the notion of post-hoc test and interpret the results
● how to visualize results of ANOVA and post-hoc tests
Conclusion: Students reviewed the goals and hypotheses of an ANOVA, what are the assumptions which need to
be verified before being able to trust the results (namely, independence, normality and
homogeneity), we then showed how to do an ANOVA in R and how to interpret the results
Experiment No. 12
Title Central Limit Theorem Demonstration Using R
Objective: To study and Demonstrate the use of Lists in Python.
Theory: The Central Limit Theorem (CLT) is like a special rule in statistics. It says that if you gather a bunch
of data and calculate the average, even if the original data doesn't look like a neat bell-shaped curve,
the averages of those groups will start to look like one if you have enough data.
What is the Central limit theorem?
The Central Limit theorem states that the distributions of the sample mean of the identically,
independent, randomly selected distributions from any population, converge towards the normal
distributions (a bell-shaped curve) as the sample size increases, even if the population distribution is
not normally distributed.
Or
Let X1,X2,...,XnX1,X2,...,Xnbe independent and identically distributed (i.i.d.) random variables
drawn from the populations with the common mean μμ and variance σ2σ2. Then, as the sample size
n approaches infinity i.e. n→∞n→∞, the sampling distribution of the sample mean XˉXˉ will
converge to a normal distribution with mean μμ and variance σ2/nσ2/n.
limn→∞(Xˉ−μσ2n)∼N(μ,σ2n)limn→∞(nσ2
Xˉ−μ)∼N(μ,nσ2)
Assumptions for Central Limit Theorem
The key assumptions for the Central Limit Theorem (CLT) are as follows:
● Independence: The random variables in the sample must be independent of each other.
● Identical Distribution: The random variables are identically distributed means each
observation is drawn from the same probability distribution with the same mean μμ and the
same variance σ2σ2.
● Sample Size: Sample Size (n) should be sufficiently large, typically (n≥30)(n≥30), for the
Central Limit Theorem CLT to provide accurate approximations.
Properties of Central limit theorem
Some of the key properties of the CLT are as follows:
● Regardless of the shape of the population distribution, the sampling distribution of the
sample mean XˉXˉ approaches a normal distribution as the sample size n increases.
As n↑xˉ→μAs n↑xˉ→μ
● As the sample size n increases, The sampling distribution's variance will be equal to the
variance of the population distribution divided by the sample size:
As n↑s2→σ2/nAs n↑s2→σ2/n
Applications of Central Limit Theorem
● The CLT is widely used in scenarios where the characteristics of the population have to be
identified but analysing the complete population is difficult.
● In data science, the CLT is used to make accurate assumptions of the population in order to
build a robust statistical model.
● In applied machine learning, the CLT helps to make inferences about the model
performance.
● In statistical hypothesis testing the central limit theorem is used to check if the given sample
belongs to a designated population.
● CLT is used in calculating the mean family income in a particular country.
● The concept of the CLT is used in election polls to estimate the percentage of people
supporting a particular candidate as confidence intervals.
● The CLT is used in rolling many identical, unbiased dice.
● Flipping many coins will result in a normal distribution for the total number of heads or,
equivalently, total number of tails.
Conclusion: I Learn Central Limit Theorem Demonstration Using R
Experiment No. 13
Title R Functions for Normal Distribution - rnorm, pnorm, qnorm and dnorm
Objective: To study and Demonstrate the use of Lists in Python.
Theory: dnorm function
This function returns the value of the probability density function (pdf) of the normal distribution
given a certain random variable x, a population mean μ, and the population standard deviation σ.
Syntax; dnorm(x, mean, sd)
Parameters:
● x: vector of quantiles.
● mean: vector of means.
● sd: vector standard deviation.
pnorm function
This function returns the value of the cumulative density function (cdf) of the normal distribution
given a certain random variable q, a population mean μ, and the population standard deviation σ.
Syntax: pnorm(q, mean, sd,lower.tail)
Parameters:
● q: It is a vector of quantiles.
● mean: vector of means.
● sd: vector standard deviation.
● lower.tail: It is logical; if TRUE (default), probabilities are otherwise
qnorm function
This function returns the value of the inverse cumulative density function (cdf) of the normal
distribution given a certain random variable p, a population mean μ, and the population standard
deviation σ.
Syntax: qnorm(p, mean = 0, sd = 0, lower.tail = TRUE)
Parameters:
● p: It represents the significance level to be used
● mean: vector of means.
● sd: vector standard deviation.
● lower.tail = TRUE: Then the probability to the left of p in the normal distribution is
returned.
rnorm function
This function generates a vector of normally distributed random variables given a vector length n, a
population mean μ and population standard deviation σ.
Syntax: rnorm(n, mean, sd)
Parameters:
● n: number of datasets to be simulated
● mean: vector of means.
● sd: vector standard deviation.
Conclusion: I Learn R Functions for Normal Distribution - rnorm, pnorm, qnorm and dnorm
Experiment No. 14
Title R Functions for Binomial Distribution - rbinom, pbinom, qbinom and dbinom
Objective: To study and Demonstrate the use of Lists in Python.
Theory: These functions provide information about the binomial distribution with parameters size and prob .
dbinom gives the density, pbinom gives the distribution function qbinom gives the quantile function
and rbinom generates random deviates.
dbinom function
This function returns the value of the probability density function (pdf) of the binomial distribution
given a certain random variable x, number of trials (size), and probability of success on each trial
(prob).
Syntax: dbinom(x, size, prob)
Parameters:
● x : a vector of numbers.
● size : the number of trials.
● prob : the probability of success of each trial.
The dbinom function is also used to get the probability of getting a certain number of successes (x)
in a certain number of trials (size) where the probability of success on each trial is fixed (prob).
pbinom function
This function returns the value of the cumulative density function (cdf) of the binomial distribution
given a certain random variable q, number of trials (size), and probability of success on each trial
(prob).
Syntax: pbinom(x, size, prob)
Parameters:
● x: a vector of numbers.
● size: the number of trials.
● prob: the probability of success of each trial.
pbinom function returns the area to the left of a given value q in the binomial distribution. If you’re
interested in the area to the right of a given value q, you can simply add the argument lower.tail =
FALSE
Syntax:
pbinom(q, size, prob, lower.tail = FALSE)
qbinom function
This function returns the value of the inverse cumulative density function (cdf) of the binomial
distribution given a certain random variable q, number of trials (size), and probability of success on
each trial (prob). And with the use of this function, we can find out the pth quantile of the binomial
distribution.
Syntax: qbinom(q, size, prob)
Parameters:
● x: a vector of numbers.
● size: the number of trials.
● prob: the probability of success of each trial.
rbinom function
This function generates a vector of binomial distributed random variables given a vector length n,
number of trials (size), and probability of success on each trial (prob).
Syntax: rbinom(n, size, prob)
Parameters:
● n: number of observations.
● size: the number of trials.
● prob: the probability of success of each trial.
Conclusion: I Learn R Functions for Binomial Distribution - rbinom, pbinom, qbinom and dbinom
Experiment No. 15
Title R Functions for Poisson Distribution - rpois, ppois, qpois and dpois
Objective: To study and Demonstrate the use of Lists in Python.
Theory: These functions provide information about the Poisson distribution with parameter lambda . dpois
gives the density, ppois gives the distribution function qpois gives the quantile function and rpois
generates random deviates.
dpois() tells you the probability mass at a given number of counts. qpois() returns the number of
counts below which a certain fraction of the probability lies. ppois() returns how much of the total
probability lies at or below a given number of counts.
R function dpois(x, lambda) is the probability of x successes in a period when the expected number
of events is lambda. R function ppois(q, lambda, lower. tail) is the cumulative probability of less
than or equal to q successes
The function rpois() is used for generating random numbers from a given Poisson distribution.
where, q: number of random numbers needed. lambda: mean per interval.
qpois() function in R Language is used to compute the value of Poisson Quantile Function.
The Poisson distribution represents the probability of a provided number of cases happening in a set
period of space or time if these cases happen with an identified constant mean rate (free of the period
since the ultimate event). Poisson distribution has been named after Siméon Denis Poisson(French
Mathematician).
Conclusion: I Learn R Functions for Poisson Distribution - rpois, ppois, qpois and dpois