[go: up one dir, main page]

0% found this document useful (0 votes)
5 views16 pages

R Prog Lab Manual Theory

The document outlines the installation of R and RStudio, detailing step-by-step instructions for both software. It also covers fundamental concepts in R programming, including data types, mathematical operators, functions, and data structures like vectors, matrices, lists, and data frames. Additionally, it discusses measures of central tendency such as mean, median, and mode, emphasizing their importance in data analysis.

Uploaded by

vijay.gujar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views16 pages

R Prog Lab Manual Theory

The document outlines the installation of R and RStudio, detailing step-by-step instructions for both software. It also covers fundamental concepts in R programming, including data types, mathematical operators, functions, and data structures like vectors, matrices, lists, and data frames. Additionally, it discusses measures of central tendency such as mean, median, and mode, emphasizing their importance in data analysis.

Uploaded by

vijay.gujar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Experiment No.

1
Title Installation of R and RStudio
Objective: To learn features of R programming and its installation.
Theory: To install R and RStudio on your computer, you need to follow two separate steps:
1. Installing R
R is the underlying software that you'll use to perform statistical computing and graphics. To install
R:
Step-by-Step Guide for Installing R:
1.​ Download R:​

○​ Go to the official R website: https://cran.r-project.org/


○​ Choose the appropriate version of R for your operating system (Windows, macOS,
or Linux).
2.​ For Windows:​

○​ Click on "Download R for Windows".


○​ Select "base" and then click on "Download R x.x.x for Windows".
○​ Run the installer and follow the instructions.
3.​ For macOS:​

○​ Click on "Download R for macOS".


○​ Choose the version that matches your macOS.
○​ Download and open the .pkg file to install R.
4.​ For Linux:​

○​ R can be installed from your Linux distribution’s package manager, or you can
follow the instructions on the CRAN website for a specific Linux distro.
5.​ Install R:​

○​ After downloading, run the installer and follow the on-screen instructions.
○​ If you’re on Windows, you can usually accept the default installation options.
6.​ Test R Installation:​

Open your terminal (or Command Prompt on Windows) and type: R


○​ You should see the R console start up.
2. Installing RStudio
RStudio is an Integrated Development Environment (IDE) for R, providing a more user-friendly
interface to write and execute R code.
Step-by-Step Guide for Installing RStudio:
1.​ Download RStudio:​

○​ Go to the RStudio download page: https://posit.co/download/rstudio-desktop/


○​ Choose the version for your operating system (Windows, macOS, or Linux).
○​ Download the .exe for Windows, .dmg for macOS, or the appropriate package for
Linux.
2.​ Install RStudio:​

○​ After downloading, open the installer and follow the installation prompts.
○​ On Windows or macOS, the installation should be straightforward. For Linux, use
the relevant package manager to install it, or you can follow detailed installation
steps provided on the RStudio website.
3.​ Test RStudio Installation:​

○​ Open RStudio after the installation is complete.


○​ RStudio should automatically detect the installed R version and start a session with
the R console.
Additional Notes:
●​ Ensure R is installed first: RStudio requires that R be installed on your system. RStudio
won't work without R.
●​ Updating R and RStudio: Both R and RStudio receive regular updates, so it’s good practice
to check for updates from time to time.
○​ In RStudio, go to Help → Check for Updates.
○​ For R, you can visit the CRAN website for the latest version.
After completing both installations, you’re ready to start coding and analyzing data with R!
Conclusion: R is installed first, followed by RStudio (which provides a more user-friendly interface).
Install essential R packages based on your experiment's requirements (e.g., data manipulation,
statistical analysis, visualization).

Experiment No. 2
Title Data types, mathematical operators and functions in R.
Objective: To study variable, data types, literals in R
Theory: In R, data types, mathematical operators, and functions form the core of programming and data
analysis. Understanding these will help you manipulate and analyze data effectivelyEach R data type
has its own set of regulations and restrictions.
Variables are not needed to be declare with a data type in R, data type even can be changed.
Example of R data Type:
#numeric data type
var <- 30
#integer data type
var <-80L
Data Types in R Programming Language
Each variable in R has an associated data type. Each R-Data Type requires different amounts of
memory and has some specific operations which can be performed over it.
Data Types in R are:
1.​ numeric – (3,6.7,121)
2.​ Integer – (2L, 42L; where ‘L’ declares this as an integer)
3.​ logical – (‘True’)
4.​ complex – (7 + 5i; where ‘i’ is imaginary number)
5.​ character – (“a”, “B”, “c is third”, “69”)
6.​ raw – ( as.raw(55); raw creates a raw vector of the specified length)
R Programming language has the following basic R-data types and the following table shows the
data type and the values that each data type can take.
Basic Data Types Values Examples

Numeric Set of all real numbers "numeric_value <- 3.14"

Integer Set of all integers, Z "integer_value <- 42L"


Logical TRUE and FALSE "logical_value <- TRUE"

Complex Set of complex numbers "complex_value <- 1 + 2i"

Character “a”, “b”, “c”, …, “@”, “#”, "character_value <- "Hello Geeks"
“$”, …., “1”, “2”, …etc

raw as.raw() "single_raw <- as.raw(255)"

R Operators
R Operators are the symbols directing the compiler to perform various kinds of operations between
the operands. Operators simulate the various mathematical, logical, and decision operations
performed on a set of Complex Numbers, Integers, and Numericals as input operands.
R supports majorly four kinds of binary operators between a set of operands. In this article, we will
see various types of operators in R Programming language and their usage.
Types of the operator in R language
●​ Arithmetic Operators
●​ Logical Operators
●​ Relational Operators
●​ Assignment Operators
●​ Miscellaneous Operators
Functions in R Programming
A function accepts input arguments and produces the output by executing valid R commands that are
inside the function.
Functions are useful when you want to perform a certain task multiple times.
In R Programming Language when you are creating a function the function name and the file in
which you are creating the function need not be the same and you can have one or more functions in
R.
Creating a Function in R Programming
Functions are created in R by using the command function(). The general structure of the function
file is as follows:

Conclusion: ●​ Data Types in R include numeric, integer, character, logical, complex, factor, and others like
Date, Time, List, and Data Frames.
●​ Mathematical Operators such as +, -, *, /, %%, ^, and comparison operators (==, <, >, etc.)
are fundamental to performing calculations and comparisons.
●​ Functions in R (like sum(), mean(), length(), paste(), etc.) are used to perform common
operations like aggregation, string manipulation, and data handling.
These tools form the foundation for working with data in R, so understanding them is essential for
any analysis or experiment.
Experiment No. 3
Title Vectors, Factors, Lists, Matrix, Data Frames in R.
Objective: To study python Keywords and Operators.
Theory: One of the key features of R is that it can handle complex statistical operations in an easy and optimized
way.
R handles complex computations using:
●​ Vector – A basic data structure of R containing the same type of data
●​ Matrices – A matrix is a rectangular array of numbers or other mathematical objects. We can do
operations such as addition and multiplication on Matrix in R.
●​ Lists – Lists store collections of objects when vectors are of same type and length in a matrix.
●​ Data Frames – Generated by combining together multiple vectors such that each vector becomes
a separate column.
Vectors in R
In R, Vector is a basic data structure in R that contains element of similar type. These data types in R can
be logical, integer, double, character, complex or raw.
In R using the function, typeof() one can check the data type of vector.
One more significant property of R vector is its length. The function length() determines the number of
elements in the vector.
1.​ > c(2, 3, 5) [1] 2 3 5
2.​ [1] 2 3 5
3.​ > length(c("aa", "bb", "cc", "dd", "ee"))
4.​ [1] 5
Indexing Vectors in R
To access individual elements and subsets in R, we use indexing technique. It can access, extract and
replace part of an object.
It is possible to print or refer to the subset of a vector by appending an index vector, enclosed in square
brackets, to the vector name.
There are four forms of vector indexing used to extract a subset of vectors:
●​ Positive integers Vector– Indicates selected elements
●​ Negative integers Vector – Indicates rejected elements
●​ Character strings Vector – Used only for vectors with named elements
●​ Logical values Vector– They are the result of evaluated conditions.

The vector of positive integers:


These are the set of integers which show the elements of the vector to be selected. These elements are
then concatenated in the specified order.
For example: To select the nth element
1.​ > TEMPERATURE[2]
2.​ Q2
3.​ 30.6
It can also select specific sets of elements or range of elements as below:
1.​ > TEMPERATURE [c(1,5,6,9)]
2.​
3.​ > TEMPERATURE[2:5]
The vector of negative integers:
These are the set of integers which show the elements of the vector that are to be excluded from
concatenation.
1.​ For example: Select all but the n<sup>th</sup> element
2.​ > TEMPERATURE[-2]
This will give a vector of all elements except the 2nd element.
The vector of character strings:
When there are named elements in the vectors, then only we can use vector of character strings. We use a
vector of element names to select elements that have to be concatenated.
For example: Select the named elements
> TEMPERATURE[c("Q1", "Q4")]
This will give values of 1st and 4th elements of the vector.
The vector of logical values:
Vector of logical values must be the same length as the subset vector and usually are the result of an
evaluated condition. Logical values of T(True) means to include and F(False) means to exclude the
elements from Vector of logical values from concatenation.
For example: Select elements for which the logical condition is true
> TEMPERATURE[TEMPERATURE < 15]
It will display elements whose value is less than 15.
We can also describe different conditions here like below:
> TEMPERATURE[TEMPERATURE <34 & SHADE = = “no”]
Matrices in R
Matrices are Data frames which contain lists of homogeneous data in a tabular format. We can perform
arithmetic operations on some elements of the matrix or the whole matrix itself in R.
Let us see how to convert a single dimension vector into a two-dimensional array using the matrix()
function:
> matrix(TEMPERATURE, nrow = 5)
We use rownames() and colnames() functions to set row and column names as follows:
1.​ > colnames(XY)
2.​ > rownames(XY) <- LETTERS[1:5]
Matrices can represent the binding of two or more vectors of equal length. If we have the X and Y
coordinates for five quadrats within a grid, we can use cbind()(combine by columns) or rbind() (combine
by rows) functions to combine them into a single matrix, as follows:
1.​ > X <- c(16.92, 24.03, 7.61, 15.49, 11.77)
2.​ > Y <- c(8.37, 12.93, 16.65, 12.2, 13.12)
3.​ > XY <- cbind(X, Y)
By default, a matrix is filled by columns. Optional argument byrow=T causes filling by row.
Let us see list of Best Books to learn R
Indexing Matrices in R
Like vectors, we can index matrices from the vectors of positive integers, negative integers, character
strings and logical values.
The difference is, matrices have two dimensions (height and width) that require a set of 2 numbers for
indexing while vectors have one dimension (length) that enable indexing of each element by a single
number.
Matrices are in the form of [row.indices, col.indices] for row and column indices.
For example: Let’s consider the XY matrix as below:
1.​ X Y
2.​ A 16.92 8.37
3.​ B 24.03 12.93
4.​ C 7.61 16.65
5.​ D 15.49 12.20
6.​ E 11.77 13.12
7.​ attr(,"description")
8.​ [1] "coordinates of quadrats"
Below are few examples of matrix indexing:
1.​ > XY[3,2]
2.​ [1] 16.65
The above command selects the element at row 3 and column 2.
1.​ >XY[3, ] – It displays entire third row.
2.​ >XY[, 2 ] – It displays entire second column.
3.​ >XY[,-2] – It displays all columns except second
4.​ >XY[“A”,1:2 ] – It displays columns 1 through 2 for Row A.
5.​ >XY[, “X” ] – It displays column named ‘X’.
Lists in R
Lists are R Data Types stores collections of objects of differing lengths and types using list() function.
For example, we can create many isolated vectors, like temperature, shade, and names to represent data
from a single experiment and group them to make them components of a list object, as follows:
> EXPERIMENT <- list(SITE = SITE, COORDINATES = paste(X,+ Y, sep = ","), TEMPERATURE =
TEMPERATURE,+ SHADE = SHADE)
List created in the above example consists of four components:
●​ SITE which is a two-character vector.
●​ A two character vector named COORDINATES, which is a vector of XY coordinates for sites A,
B, C, D, and E
●​ TEMPERATURE which is a numeric vector.
●​ A factor named SHADE
Data Frames and DataSets in R
Rarely are single variable collected in isolation. Data is mostly collected in sets of variables which reflect
investigations of patterns between different variables. So, data sets are best organized into matrices of
variables of the same lengths yet not necessarily of the same type. Here Data Frames come to the rescue
as they store a list of vectors of the same length, yet different types, in a rectangular matrix.
We can create Data Frames by combining together many vectors in a manner that each vector becomes a
separate column. In this way, the data frame is like a matrix in which each column can represent a
different vector type.
The sequence and number of observations in the vectors must be the same for each vector in the Data
Frame to represent a DataSet.
The first, second and third entries in each vector, for example, must represent the observations collected
from first, second and third sampling units respectively.
Conclusion: I learn Vectors, Factors, Lists, Matrix, Data Frames in R.

Experiment No. 4
Title Measurement of Central Tendency Mean, Median and Mode.
Objective: To study Python Oprators ( Bitwise, Identity, Membership).
Theory: Mean, median, and mode are the three most common measures of central tendency, which are numerical
values that summarize the middle of a data set:
Measure Definition

Mean The average of a data set, calculated by adding all values and dividing by the total
number of values

Median The middle number in an ordered data set, where there are an equal number of values
higher and lower than it

Mode The value that occurs most often in a data set


The mean is considered the most stable measure of central tendency because it uses all the observations in
a distribution. However, the mean is affected by extreme values, also known as outliers. In these
situations, the median is generally considered to be the best representative of the data

Conclusion: I learn
Experiment No. 5
Title Measurement of Variation - Range, IQR and Standard Deviation.
Objective: To study the if-else statement and indentation
Theory: Range, interquartile range (IQR), and standard deviation are all measures of how spread out a data set is,
or its variability: ​

Range: The difference between the largest and smallest values in a data set.
Interquartile range (IQR): The distance between the first and third quartiles, which are the 25th and 75th
percentiles. IQR measures the spread of the middle 50% of the data.
Standard deviation: A measure of how far individual data points are from the mean of the data set. It's
calculated as the square root of the variance.
Here are some more details about these measures:​

Variance: A measure of how spread out the data is, with larger variance values indicating more
significant dispersion. Variance is the average of squared deviations from the mean.
Standard deviation formula: The formula for standard deviation is the square root of the sum of squared
differences from the mean divided by the size of the data set.
IQR and median: IQR is used in conjunction with the median, which is the 50th percentile.
Conclusion: I learn Measurement of Central Tendency Mean, Median and Mode

Experiment No. 6
Title Descriptive Statistics Using psych Package.
Objective: To study and demonstrate the Loop statements in Python
Theory: psych Package in R
The “psych” package is an R package that provides various functions for psychological research and
data analysis. It contains tools for data visualization, factor analysis, reliability analysis, correlation
analysis, and more. In this article, we will discuss the basics of using the “psych” package in R
Programming Language.
Introduction to Psych Package
Before we proceed to the steps, it is important to understand some key concepts related to the
“psych” package:
1.​ Factor Analysis: This is a statistical technique used to identify underlying factors or
dimensions in a set of observed variables. The “psych” package provides various functions
for performing factor analysis, including “principal()”, “fa()”, and “fa.parallel()”.
2.​ Reliability Analysis: This involves testing the consistency and stability of a set of
measurements or items. The “psych” package provides functions for calculating various
types of reliability coefficients, including Cronbach’s alpha, Guttman’s lambda, and more.
3.​ Principal Component Analysis: This is another statistical technique used to identify
underlying dimensions in a set of variables. The psych package includes functions for
performing PCA and visualizing the results.
4.​ Cluster Analysis: This is a technique used to group objects or individuals based on their
similarities or differences. The psych package includes functions for hierarchical clustering,
k-means clustering, and other types of cluster analysis.
5.​ Correlation Analysis: This involves examining the relationship between two or more
variables. The “psych” package provides functions for calculating various types of
correlation coefficients, including Pearson’s r, Spearman’s rho, and Kendall’s tau.
Conclusion: I learn Descriptive Statistics Using psych Package.
Experiment No. 7
Title One & two Sample z Test Using R
Objective: To study and apply the Function in Python.
Theory: You can use the z.test() function from the BSDA package to perform one sample and two sample
z-tests in R.
This function uses the following basic syntax:
z.test(x, y, alternative='two.sided', mu=0, sigma.x=NULL, sigma.y=NULL,conf.level=.95)

where:
●​ x: values for the first sample
●​ y: values for the second sample (if performing a two sample z-test)
●​ alternative: the alternative hypothesis (‘greater’, ‘less’, ‘two.sided’)
●​ mu: mean under the null or mean difference (in two sample case)
●​ sigma.x: population standard deviation of first sample
●​ sigma.y: population standard deviation of second sample
●​ conf.level: confidence level to use
Example 1: One Sample Z-Test in R
Suppose the IQ in a certain population is normally distributed with a mean of μ = 100 and standard
deviation of σ = 15.
A scientist wants to know if a new medication affects IQ levels, so she recruits 20 patients to use it
for one month and records their IQ levels at the end of the month.
Example 2: Two Sample Z-Test in R
Suppose the IQ levels among individuals in two different cities are known to be normally distributed
each with population standard deviations of 15.
A scientist wants to know if the mean IQ level between individuals in city A and city B are different,
so she selects a simple random sample of 20 individuals from each city and records their IQ levels.

Conclusion: I learn One & two Sample z Test Using R


Experiment No. 8
Title One & two Sample t Test Using R
Objective: To study and Demonstrate the use of Lists in Python.
Theory: Two-Sample t-test in R
In statistics, the two-sample t-test is like a measuring stick we use to see if two groups are different
from each other. It helps us figure out if the difference we see is real or just random chance. In this
article, we will calculate a Two-Sample t-test in the R Programming Language.
What is a Two-Sample t-test?
The two-sample t-test is a statistical method used to determine if there's a significant difference
between the means of two independent groups. It assesses whether the means of these groups are
statistically different from each other or if any observed difference is due to random variation. For
example, if we're comparing test scores of two classes, we use this test to know if one class did
better than the other by a meaningful amount, or if it's just luck.
Before using a two-sample t-test, it's important to make sure of the following:
●​ The data in each group are separate and have similar distributions.
●​ The populations from which the samples are taken follow a typical bell-shaped curve.
●​ The variations within the populations are similar (equal variances).
Syntax:
t.test(x, y, alternative = "two.sided", mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95)
●​ x and y: These are the numeric vectors or data frames containing the two samples you want
to compare.
●​ alternative: This specifies the alternative hypothesis. It can take values "two.sided", "less",
or "greater", indicating whether you're testing for a two-tailed, left-tailed, or right-tailed test,
respectively.
●​ mu: This is the hypothesized difference in means under the null hypothesis. The default is
zero.
●​ paired: If set to TRUE, it indicates that the two samples are paired (e.g., before and after
measurements). The default is FALSE for unpaired samples.
●​ var.equal: If set to TRUE, it assumes equal variances in the two groups. The default is
FALSE.
●​ conf.level: This specifies the confidence level for the confidence interval. The default is
0.95.
How to Perform Two-Sample t-test
Suppose we want to compare the heights of two groups of students, male and female, to see if there's
a significant difference in their average heights.
Step 1: Input Data
Step 2: Conduct the t-test
Step 3: Interpretation
Conclusion: The two-sample t-test is a handy way to compare averages between two groups. By following easy
steps and understanding the results, we can tell if the differences we see are real or just random
chance. This test helps us make sense of data and draw meaningful conclusions.
Experiment No. 9
Title Goodness of Fit Test Using R
Objective: To study and Demonstrate the use of Lists in Python.
Theory: The Chi-Square Goodness of Fit Test is a statistical test used to analyze the difference between the
observed and expected frequency distribution values in categorical data. This test is popularly used in
various domains such as science, biology, business, etc. In this article, we will understand how to perform
the chi-square test in the R Programming Language.
What is the Chi-Square Goodness of Fit Test?
The chi-square goodness of fit test is used to measure the significant difference between the expected and
observed frequencies under the null hypothesis that there is no difference between the expected and
observed frequencies. We can use the formula to calculate the chi-square test mathematically.
\chi^2 = \sum_{i} \frac{(O_i - E_i)^2}{E_i}
where,
●​ χ2is the Chi-Square statistic
●​ Oi is the observed frequency for each category
●​ EI is the expected frequency for each category
●​ ∑ denotes the sum over all categories.
Calculating the Chi-square goodness of fit test manually in R
We can calculate the chi-square test since we know the mathematical formula for it. In this example, we
will create a fictional dataset comparing the frequencies of transportation modes of cities.

Conclusion: I Learn Goodness of Fit Test Using R

Experiment No. 10
Title Contingency Table Using R
Objective: To study and Demonstrate the use of Sets and Dictionary in Python.
Theory: Contingency tables are very useful to condense a large number of observations into smaller to make
it easier to maintain tables. A contingency table shows the distribution of a variable in the rows and
another in its columns. Contingency tables are not only useful for condensing data, but they also
show the relations between variables. They are a way of summarizing categorical variables. A
contingency table that deals with a single table are called a complex or a flat contingency table.
Making Contingency tables
A contingency table is a way to redraw data and assemble it into a table. And, it shows the layout of
the original data in a manner that allows the reader to gain an overall summary of the original data.
The table() function is used in R to create a contingency table. The table() function is one of the
most versatile functions in R. It can take any data structure as an argument and turn it into a table.
The more complex the original data, the more complex is the resulting contingency table.
Creating contingency tables from Vectors
In R a vector is an ordered collection of basic data types of a given length. The only key thing here is
all the elements of a vector must be of the identical data type e.g homogeneous data structures.
Vectors are one-dimensional data structures. It is the simplest data object from which you can create
a contingency table.
Example:
# R program to illustrate
# Contingency Table

# Creating a vector
vec = c(2, 4, 3, 1, 6, 3, 2, 1, 4, 5)

# Creating contingency table from vec using table()


conTable = table(vec)
print(conTable)

Conclusion: I Learn Contingency Table Using R

Experiment No. 11
Title Analysis of Variance (ANOVA) Using R
Objective: To study and Demonstrate the use of Lists in Python.
Theory: ANOVA (ANalysis Of VAriance) is a statistical test to determine whether two or more population
means are different. In other words, it is used to compare two or more groups to see if they are
significantly different.
In practice, however, the:
●​ Student t-test is used to compare 2 groups;
●​ ANOVA generalizes the t-test beyond 2 groups, so it is used to compare 3 or more groups.
Note that there are several versions of the ANOVA (e.g., one-way ANOVA, two-way ANOVA,
mixed ANOVA, repeated measures ANOVA, etc.). In this article, we present the simplest form
only—the one-way ANOVA1—and we refer to it as ANOVA in the remaining of the article.
Although ANOVA is used to make inference about means of different groups, the method is called
“analysis of variance”. It is called like this because it compares the “between” variance (the variance
between the different groups) and the variance “within” (the variance within each group). If the
between variance is significantly larger than the within variance, the group means are declared to be
different. Otherwise, we cannot conclude one way or the other. The two variances are compared to
each other by taking the ratio (variancebetweenvariancewithin
) and then by comparing this ratio to a threshold from the Fisher probability distribution (a threshold
based on a specific significance level, usually 5%).
This is enough theory regarding the ANOVA method for now. In the remaining of this article, we
discuss about it from a more practical point of view, and in particular we will cover the following
points:
●​ the aim of the ANOVA, when it should be used and the null/alternative hypothesis
●​ the underlying assumptions of the ANOVA and how to check them
●​ how to perform the ANOVA in R
●​ how to interpret results of the ANOVA
●​ understand the notion of post-hoc test and interpret the results
●​ how to visualize results of ANOVA and post-hoc tests
Conclusion: Students reviewed the goals and hypotheses of an ANOVA, what are the assumptions which need to
be verified before being able to trust the results (namely, independence, normality and
homogeneity), we then showed how to do an ANOVA in R and how to interpret the results
Experiment No. 12
Title Central Limit Theorem Demonstration Using R
Objective: To study and Demonstrate the use of Lists in Python.
Theory: The Central Limit Theorem (CLT) is like a special rule in statistics. It says that if you gather a bunch
of data and calculate the average, even if the original data doesn't look like a neat bell-shaped curve,
the averages of those groups will start to look like one if you have enough data.
What is the Central limit theorem?
The Central Limit theorem states that the distributions of the sample mean of the identically,
independent, randomly selected distributions from any population, converge towards the normal
distributions (a bell-shaped curve) as the sample size increases, even if the population distribution is
not normally distributed.
Or
Let X1,X2,...,XnX1​,X2​,...,Xn​be independent and identically distributed (i.i.d.) random variables
drawn from the populations with the common mean μμ and variance σ2σ2. Then, as the sample size
n approaches infinity i.e. n→∞n→∞, the sampling distribution of the sample mean XˉXˉ will
converge to a normal distribution with mean μμ and variance σ2/nσ2/n.
lim⁡n→∞(Xˉ−μσ2n)∼N(μ,σ2n)limn→∞​(nσ2​
​Xˉ−μ​)∼N(μ,nσ2​)
Assumptions for Central Limit Theorem
The key assumptions for the Central Limit Theorem (CLT) are as follows:
●​ Independence: The random variables in the sample must be independent of each other.
●​ Identical Distribution: The random variables are identically distributed means each
observation is drawn from the same probability distribution with the same mean μμ and the
same variance σ2σ2.
●​ Sample Size: Sample Size (n) should be sufficiently large, typically (n≥30)(n≥30), for the
Central Limit Theorem CLT to provide accurate approximations.
Properties of Central limit theorem
Some of the key properties of the CLT are as follows:
●​ Regardless of the shape of the population distribution, the sampling distribution of the
sample mean XˉXˉ approaches a normal distribution as the sample size n increases.​
As n↑xˉ→μAs n↑xˉ→μ
●​ As the sample size n increases, The sampling distribution's variance will be equal to the
variance of the population distribution divided by the sample size:​
As n↑s2→σ2/nAs n↑s2→σ2/n
Applications of Central Limit Theorem
●​ The CLT is widely used in scenarios where the characteristics of the population have to be
identified but analysing the complete population is difficult.
●​ In data science, the CLT is used to make accurate assumptions of the population in order to
build a robust statistical model.
●​ In applied machine learning, the CLT helps to make inferences about the model
performance.
●​ In statistical hypothesis testing the central limit theorem is used to check if the given sample
belongs to a designated population.
●​ CLT is used in calculating the mean family income in a particular country.
●​ The concept of the CLT is used in election polls to estimate the percentage of people
supporting a particular candidate as confidence intervals.
●​ The CLT is used in rolling many identical, unbiased dice.
●​ Flipping many coins will result in a normal distribution for the total number of heads or,
equivalently, total number of tails.
Conclusion: I Learn Central Limit Theorem Demonstration Using R
Experiment No. 13
Title R Functions for Normal Distribution - rnorm, pnorm, qnorm and dnorm
Objective: To study and Demonstrate the use of Lists in Python.
Theory: dnorm function
This function returns the value of the probability density function (pdf) of the normal distribution
given a certain random variable x, a population mean μ, and the population standard deviation σ.
Syntax; dnorm(x, mean, sd)
Parameters:
●​ x: vector of quantiles.
●​ mean: vector of means.
●​ sd: vector standard deviation.
pnorm function
This function returns the value of the cumulative density function (cdf) of the normal distribution
given a certain random variable q, a population mean μ, and the population standard deviation σ.
Syntax: pnorm(q, mean, sd,lower.tail)
Parameters:
●​ q: It is a vector of quantiles.
●​ mean: vector of means.
●​ sd: vector standard deviation.
●​ lower.tail: It is logical; if TRUE (default), probabilities are otherwise
qnorm function
This function returns the value of the inverse cumulative density function (cdf) of the normal
distribution given a certain random variable p, a population mean μ, and the population standard
deviation σ.
Syntax: qnorm(p, mean = 0, sd = 0, lower.tail = TRUE)
Parameters:
●​ p: It represents the significance level to be used
●​ mean: vector of means.
●​ sd: vector standard deviation.
●​ lower.tail = TRUE: Then the probability to the left of p in the normal distribution is
returned.
rnorm function
This function generates a vector of normally distributed random variables given a vector length n, a
population mean μ and population standard deviation σ.
Syntax: rnorm(n, mean, sd)
Parameters:
●​ n: number of datasets to be simulated
●​ mean: vector of means.
●​ sd: vector standard deviation.

Conclusion: I Learn R Functions for Normal Distribution - rnorm, pnorm, qnorm and dnorm
Experiment No. 14
Title R Functions for Binomial Distribution - rbinom, pbinom, qbinom and dbinom
Objective: To study and Demonstrate the use of Lists in Python.
Theory: These functions provide information about the binomial distribution with parameters size and prob .
dbinom gives the density, pbinom gives the distribution function qbinom gives the quantile function
and rbinom generates random deviates.
dbinom function
This function returns the value of the probability density function (pdf) of the binomial distribution
given a certain random variable x, number of trials (size), and probability of success on each trial
(prob).
Syntax: dbinom(x, size, prob)
Parameters:
●​ x : a vector of numbers.
●​ size : the number of trials.
●​ prob : the probability of success of each trial.
The dbinom function is also used to get the probability of getting a certain number of successes (x)
in a certain number of trials (size) where the probability of success on each trial is fixed (prob).
pbinom function
This function returns the value of the cumulative density function (cdf) of the binomial distribution
given a certain random variable q, number of trials (size), and probability of success on each trial
(prob).
Syntax: pbinom(x, size, prob)
Parameters:
●​ x: a vector of numbers.
●​ size: the number of trials.
●​ prob: the probability of success of each trial.
pbinom function returns the area to the left of a given value q in the binomial distribution. If you’re
interested in the area to the right of a given value q, you can simply add the argument lower.tail =
FALSE
Syntax:
pbinom(q, size, prob, lower.tail = FALSE)
qbinom function
This function returns the value of the inverse cumulative density function (cdf) of the binomial
distribution given a certain random variable q, number of trials (size), and probability of success on
each trial (prob). And with the use of this function, we can find out the pth quantile of the binomial
distribution.
Syntax: qbinom(q, size, prob)
Parameters:
●​ x: a vector of numbers.
●​ size: the number of trials.
●​ prob: the probability of success of each trial.
rbinom function
This function generates a vector of binomial distributed random variables given a vector length n,
number of trials (size), and probability of success on each trial (prob).
Syntax: rbinom(n, size, prob)
Parameters:
●​ n: number of observations.
●​ size: the number of trials.
●​ prob: the probability of success of each trial.

Conclusion: I Learn R Functions for Binomial Distribution - rbinom, pbinom, qbinom and dbinom
Experiment No. 15
Title R Functions for Poisson Distribution - rpois, ppois, qpois and dpois
Objective: To study and Demonstrate the use of Lists in Python.
Theory: These functions provide information about the Poisson distribution with parameter lambda . dpois
gives the density, ppois gives the distribution function qpois gives the quantile function and rpois
generates random deviates.
dpois() tells you the probability mass at a given number of counts. qpois() returns the number of
counts below which a certain fraction of the probability lies. ppois() returns how much of the total
probability lies at or below a given number of counts.
R function dpois(x, lambda) is the probability of x successes in a period when the expected number
of events is lambda. R function ppois(q, lambda, lower. tail) is the cumulative probability of less
than or equal to q successes
The function rpois() is used for generating random numbers from a given Poisson distribution.
where, q: number of random numbers needed. lambda: mean per interval.
qpois() function in R Language is used to compute the value of Poisson Quantile Function.
The Poisson distribution represents the probability of a provided number of cases happening in a set
period of space or time if these cases happen with an identified constant mean rate (free of the period
since the ultimate event). Poisson distribution has been named after Siméon Denis Poisson(French
Mathematician).
Conclusion: I Learn R Functions for Poisson Distribution - rpois, ppois, qpois and dpois

You might also like