Bansilal RamnathAgarwal Charitable Trust’s
VISHWAKARMA INSTITUTE OF TECHNOLOGY – PUNE
Department of SY Common
MD2201: Data Science
Name of the student: Saurabh Jadhav Roll No. 27
Div: B Batch: CS
Date of performance: 29-9-2021
Experiment No.2
Title: Laboratory on Statistics
Aim: To find the probability for Binomial distribution and Normal distribution and verify the Normal
approximation of Binomial distribution.
Software used: Programming language R.
Code Statement:
Write a single R code to answer the following questions.
Data Set: Travelled Abroad
1. Find out the % of Indians in the sample who have travelled abroad using the data source.
2. Treating this value as ‘p’, calculate the following probabilities –
a. What is the probability that in a randomly chosen sample of 10 persons, no one has
travelled abroad?
b. What is the probability that in a randomly chosen sample of 10 persons, exactly one has
travelled abroad?
c. What is the probability that in a randomly chosen sample of 10 persons, exactly two persons
have travelled abroad?
d. What is the probability that in a randomly chosen sample of 10 persons, exactly three
persons have travelled abroad?
e. What is the probability that in a randomly chosen sample of 10 persons, exactly four
persons have travelled abroad?
f. What is the probability that in a randomly chosen sample of 10 persons, exactly five persons
have travelled abroad.
g. What is the probability that in a randomly chosen sample of 10 persons, exactly six persons
have travelled abroad?
h. What is the probability that in a randomly chosen sample of 10 persons, exactly seven
persons have travelled abroad?
i. What is the probability that in a randomly chosen sample of 10 persons, exactly eight
persons have travelled abroad?
j. What is the probability that in a randomly chosen sample of 10 persons, exactly nine
persons have travelled abroad?
k. What is the probability that in a randomly chosen sample of 10 persons, all 10 persons have
travelled abroad?
3. Plot the probability values as a Table / Bar graph/plot and interpret plot.
Bansilal RamnathAgarwal Charitable Trust’s
VISHWAKARMA INSTITUTE OF TECHNOLOGY – PUNE
Department
4. What is the probability that in the of SY Common
randomly chosen sample of 100 persons at least 59 have travelled
abroad?
Hint: Expected to perform Normal approximation for the binary distribution.
Code:
f <- read.csv('travelled abroad data_csv.csv')
#for n=10 samples
p = sum(f$Travelledabroad == "Y")/ nrow(f)
cat("\nThe percentage of Indians travelled abroad is:", p*100)
d = dbinom(0:10, 10, p)
cat("\n\nThe probabilities for n=1 to n=10 are:", d)
cat('\n-----------------------------------------------------------------------------')
cat('\nThese are the probabilities from 10 samples,n=1 to n=10 people have travelled abroad.')
k = 0:10
plot(k, d, type="l", main="Case 1: n=10 samples")
#for n=100 samples
sb = sum(dbinom(59:100, 100, p))
cat("\n\nThe probability with n=100 cases with binomial distribution:", sb)
m = 100*p
cat("\n\nMean of normal distribution is:", m)
n=100
sd1 = sqrt(n*p*(1-p))
cat("\n\nStandard Deviation of normal distribution is:", sd1)
sn = pnorm(59,m,sd1,lower.tail = F)
cat("\n\nProbability for n=100 case with normal distribution:", sn)
cat('\nAt least 59 have travelled abroad',sn)
k1 = 0:100
d1 = dbinom(0:100,100,p)
plot(k1,d1,type="l",main="case 2: n=100 samples")
Results: Display the output obtained on R console for all the cases. Also add the plots which you
obtained. Give proper title to the plots as per the condition.
Output:
Bansilal RamnathAgarwal Charitable Trust’s
VISHWAKARMA INSTITUTE OF TECHNOLOGY – PUNE
Department of SY Common
Plots:
Bansilal RamnathAgarwal Charitable Trust’s
VISHWAKARMA INSTITUTE OF TECHNOLOGY – PUNE
Department of SY Common
For n=10 samples
For n=100 samples
Conclusion: (Write the conclusion in your words)
In this lab, I learned to find the probability for Binomial distribution and Normal
distribution and verify the Normal approximation of Binomial distribution with the
help of the travelled_abroad.csv dataset. The variations in the plot when we increase
the samples from 10 to 100 and also to find the mean and the standard deviation.