[go: up one dir, main page]

50% found this document useful (2 votes)
375 views44 pages

Stastistics and Probability With R Programming Language: Lab Report

This document provides a lab report on statistics and probability using R programming. It includes examples of simple operations in R like finding minimum, maximum, and length of a data list. It also demonstrates reading a data file, performing operations on the data like calculating summary statistics, plotting graphs, and performing hypothesis tests. Functions used include read.csv, mean, median, sd, plot, hist, and boxplot. The document contains examples of importing data, accessing and manipulating it to calculate measures and visualize the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
375 views44 pages

Stastistics and Probability With R Programming Language: Lab Report

This document provides a lab report on statistics and probability using R programming. It includes examples of simple operations in R like finding minimum, maximum, and length of a data list. It also demonstrates reading a data file, performing operations on the data like calculating summary statistics, plotting graphs, and performing hypothesis tests. Functions used include read.csv, mean, median, sd, plot, hist, and boxplot. The document contains examples of importing data, accessing and manipulating it to calculate measures and visualize the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 44

STASTISTICS AND PROBABILITY WITH R

PROGRAMMING LANGUAGE
Course Code – MAT1010

LAB REPORT

Under the guidance of : Prof Dr. SANTANU MANDAL


Done by: AYUSH ANAND SAGAR(17BES7003)

ECE-EMBEDDED SYSTEM
INDEX

Introduction to R programming and Simple Operations…….. 3


Operations on Data files………………………………….….. 7
Matrix operations on Data files…………………………….... 19
Random Sampling, Probability and Choose functions……….25
Binomial Distribution………………………………………….29
Pnorm, Qnorm and Rnorm functions………………………….31
Histogram…………………………………………………… 33
Test of Hypothesis: Z-Test…………………………………… 34
Test of Hypothesis: T-Test………………………………….. 36
Linear Regression and Correlation…………………………… 40
INTRODUCTION TO R PROGRAMMING AND
SIMPLE OPERATIONS IN R
1. Simple Operations
a) Enter the data {2,5,3,7,1,9,6} directly and store it in a variable x.
b) Find the number of elements in x, i.e. in the data list.
c) Find the last element of x.
d) Find the minimum element of x.
e) Find the maximum element of x.

1.
a) x<-c(2,5,3,7,1,9,6)
x-2 5 3 7 1 9 6
b) length(x)
7
c) x[length(x)]
6
d) min(x)
1
e) max(x)
9

2. Enter the data {1, 2, …. ,19,20} in a variable x


a) Find the 3rd element in the data list.
b) Find 3rd to 5th element in the data list.
c) Find 2nd, 5th, 6th, and 12th element in the list.
d) Print the data as {20, 19, …, 2, 1} without again entering the data.
2.
a) x<-c(1:20)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
b) x[3:5]
345
c) x[c(2,5,6,12)]
2 5 6 12
d) rev(x) or x<-c(20:1)
20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

3.
a) Create a data list (4, 4, 4, 4, 3, 3, 3, 5, 5, 5) using ‘rep’ function.
b) Create a list (4, 6, 3, 4, 6, 3, …, 4, 6, 3) where there 10 occurrences of 4, 6, and
3 in the given order.
c) Create a list (3, 1, 5, 3, 2, 3, 4, 5, 7,7, 7, 7, 7,7, 6, 5, 4, 3, 2, 1, 34, 21, 54) using
one-line command.
d) First create a list (2, 1, 3, 4). Then append this list at the end with another list (5, 7,
12, 6, -8). Check whether the number of elements in the augmented list is 11.

3.

a) x<-c(rep(4,4),rep(3,3),rep(5,3))

>x

[1] 4 4 4 4 3 3 3 5 5 5

b) x<-

c(rep(c(4,6,3),10))

>x

[1] 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3
c)

x<-c(3,1,5,3,c(2:5),rep(7,6),c(6:1),34,21,54)

>x

[1] 3 1 5 3 2 3 4 5 7 7 7 7 7 7 6 5 4 3 2 1 34 21 54

d)

> x<-c(2,1,3,4)

> y<-c(5,7,12,6,-8)

> append(x,y)

[1] 2 1 3 4 5 7 12 6 -8 >

length(append(x,y))==11

[1] FALSE
4.
(a) Print all numbers starting with 3 and ending with 7 with an increment
of 0:5. Store these numbers in x.
(b) Print all even numbers between 2 and 14 (both inclusive)
(a) Type 2*x and see what you get. Each element of x is multiplied by 2.

4.

a) x<-seq(3,7,0:5)

>x

[1] 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0

b)

> seq(2,14,2)

[1] 2 4 6 8 10 12 14

c) > x*2 or x^2


[1] 6 7 8 9 10 11 12 13 14

5. Few simple statistical measures:


(a) Enter data as 1,2, … ,10.
(b) Find sum of the numbers.
(c) Find mean, median.
(d) Find sum of squares of these values.
(e) Find the value of 1 Σ| − | =1, This is known as mean deviation about mean ( ).

(f) Check whetheris less than or equal to standard deviation

5.
a) x<-c(1:10)
b) sum(x)
55
c) mean(x)
5.5
median(x)
5.5
d) sum(x^2)
385
e) md<-sum(abs(x-mean(x)))/length(x)
>md
2.5

f) md<=sd(x)
TRUE
OPERATIONS ON DATA FILES

1. Few simple statistical


measures: a) Enter data as 1,2, …
,10.
b) Find sum of the
numbers. c) Find mean,
median.
d) Find sum of squares of these values.
e) Find the value of 1 Σ| − | =1, This is known as mean deviation about mean ( ).

f) Check whether is less than or equal to standard deviation. g)


Find standard deviation using formula.

1.
a) x<-c(1:10)
b) sum(x)
55
c) mean(x)
5.5
median(x)
5.5
d) sum(x^2)
385
e) md<-sum(abs(x-mean(x)))/length(x)
>md
2.5
f) md<=sd(x)

g) sd(x)
3.02765
2. Reading a data file and working with it: a)
Read the file first and store it in a.
b) How many rows are there in this table? How many columns are there?
c) How to find the number of rows and number of columns by a single command? d) What
are the variables in the data file?
e) If the file is very large, naturally we cannot simply type `a', because it will cover the entire
screen and we won't be able to understand anything. So how to see the top or bottom few
lines in this file?
f) If the number of columns is too large, again we may face the same problem. So how to
see the first 5 rows and first 3 columns?
g) How to get 1st, 3rd, 6th, and 10th row and 2nd, 4th, and 5th column?
h) How to get values in a specific row or a column?

2.
a)
a<-read.csv('house_data_1.csv')
>a
Price FloorArea Rooms Age CentralHeating
1 52.00 1225 3 6.2 No
2 54.75 1230 3 7.5 No
3 57.50 1200 3 4.2 No
4 57.50 1000 2 8.8 No
5 59.75 1420 4 1.9 Yes
6 62.50 1450 3 5.2 No
7 64.75 1380 4 6.6 Yes
8 67.25 1510 4 2.3 No
9 67.50 1400 5 6.1 No
10 69.75 1550 6 9.2 no
11 70.00 1720 6 4.3 yes
12 75.50 1700 5 4.3 no
13 77.50 1660 6 1.0 yes
14 78.00 1800 7 7.0 yes
15 81.25 1830 6 3.6 yes
16 82.50 1790 6 1.7 yes
17 86.25 2010 6 1.2 yes
18 87.50 2000 6 0.0 yes
19 88.00 2100 8 2.3 yes
20 92.00 2240 7 0.7 yes
b)
> nrow(a)
[1] 20
> ncol(a)
[1] 5

c)
> c(nrow(a),ncol(a)) or dim(a)
[1] 20 5

d)
> names(a)
[1] "Price" "FloorArea" "Rooms" "Age" "CentralHeating"

e)
> head(a)
Price FloorArea Rooms Age CentralHeating
1 52.00 1225 3 6.2 No
2 54.75 1230 3 7.5 No
3 57.50 1200 3 4.2 No
4 57.50 1000 2 8.8 No
5 59.75 1420 4 1.9 Yes
6 62.50 1450 3 5.2 No

> tail(a)
Price FloorArea Rooms Age CentralHeating
15 81.25 1830 6 3.6 Yes
16 82.50 1790 6 1.7 Yes
17 86.25 2010 6 1.2 Yes
18 87.50 2000 6 0.0 Yes
19 88.00 2100 8 2.3 Yes
20 92.00 2240 7 0.7 Yes

f)
> a[1:5,1:3]
Price FloorArea Rooms
1 52.00 1225 3
2 54.75 1230 3
3 57.50 1200 3
4 57.50 1000 2
5 59.75 1420 4

g)
> a[c(1,3,6,10),c(2,4,5)]
FloorArea Age CentralHeating
1 1225 6.2 no
3 1200 4.2 no
6 1450 5.2 no
10 1550 9.2 no

h)
> a[5]
CentralHeating
1 no
2 no
3 no
4 no
5 yes
6 no
7 yes
8 no
9 no
10 no
11 yes
12 no
13 yes
14 yes
15 yes
16 yes
17 yes
18 yes
19 yes
20 yes
> a[2,]
Price FloorArea Rooms Age CentralHeating
2 54.75 1230 3 7.5 no

>a
Price FloorArea Rooms Age CentralHeating
1 52.00 1225 3 6.2 no
2 54.75 1230 3 7.5 no
3 57.50 1200 3 4.2 no
4 57.50 1000 2 8.8 no
5 59.75 1420 4 1.9 yes
6 62.50 1450 3 5.2 no
7 64.75 1380 4 6.6 yes
8 67.25 1510 4 2.3 no
9 67.50 1400 5 6.1 no
10 69.75 1550 6 9.2 no
11 70.00 1720 6 4.3 yes
12 75.50 1700 5 4.3 no
13 77.50 1660 6 1.0 yes
14 78.00 1800 7 7.0 yes
15 81.25 1830 6 3.6 yes
16 82.50 1790 6 1.7 yes
17 86.25 2010 6 1.2 yes
18 87.50 2000 6 0.0 yes
19 88.00 2100 8 2.3 yes
20 92.00 2240 7 0.7 yes

3. Calculate simple statistical measures using the values in the data file.
a) Find means, medians, standard deviations of Price, Floor Area, Rooms,
and Age.
b) How many houses have central heating and how many don't have?
c) Plot Price vs. Floor, Price vs. Age, and Price vs. rooms, in separate graphs.
d) Draw histograms of Prices, FloorArea, and Age.
e) Draw box plots of Price, FloorArea, and Age.
f) Draw all the graphs in (c), (d), and (e) in the same graph paper.

3)
a)
> mean(a[,1])
[1] 71.5875
> mean(a[,2])
[1] 1610.75
> mean(a[,3])
[1] 5
> mean(a[,4])
>
> [1] 4.205
> median(a[,1])
[1] 69.875
> median(a[,2])
[1] 1605
> median(a[,3])
[1] 5.5
> median(a[,4])
[1] 4.25
> sd(a[,1])
[1] 12.21094
> sd(a[,2])
[1] 331.9649
> sd(a[,3])
[1] 1.65434
> sd(a[,4])
[1] 2.786523

Alternatively

> names(a)
[1] "Price" "FloorArea" "Rooms" "Age" "CentralHeating"
> mean(a$Price)
[1] 71.5875
> mean(a$FloorArea)
[1] 1610.75
> mean(a$Rooms)
[1] 5
> mean(a$Age)
[1] 4.205

b)
> sum(a$CentralHeating=="yes")
[1] 11
> sum(a$CentralHeating=="no")
> [1] 9
>
>
> c)
>
>
> >plot(a$Price,a$Floor)
>
> plot(a$Price,a$Age)

> plot(a$Price,a$Rooms)
d)
> hist(a$Price,freq=F)

> hist(a$FloorArea,freq=F)
> hist(a$Age,freq=F)

e)
> boxplot(a$Price)
> boxplot(a$FloorArea)

> boxplot(a$Age)

f)
> plot(a$Price,a$Floor)
> plot(a$Price,a$Age)
> plot(a$Price,a$Rooms)
> hist(a$Price,freq=F)
> hist(a$FloorArea,freq=F)
> hist(a$Age,freq=F)
> boxplot(a$Price)
> boxplot(a$FloorArea)
> boxplot(a$Age)
MATRIX OPERATIONS ON DATA FILES

1. Augmenting the file and saving the resultant file:


a) Calculate the value per square foot area of each apartment and store it in a
vector named “PriceSqFt”.
b) Place this vector after the last column in the data file.
c) Save the augmented file under name “HouseInfo.txt”.
d) Read the file "HouseInfo.txt".

1)
a)
>a<-read.csv('house_data_1.csv')
>a
Price FloorArea Rooms Age CentralHeating
1 52.00 1225 3 6.2 No
2 54.75 1230 3 7.5 No
3 57.50 1200 3 4.2 No
4 57.50 1000 2 8.8 No
5 59.75 1420 4 1.9 Yes
6 62.50 1450 3 5.2 No
7 64.75 1380 4 6.6 Yes
8 67.25 1510 4 2.3 No
9 67.50 1400 5 6.1 No
10 69.75 1550 6 9.2 no
11 70.00 1720 6 4.3 yes
12 75.50 1700 5 4.3 no
13 77.50 1660 6 1.0 yes
14 78.00 1800 7 7.0 yes
15 81.25 1830 6 3.6 yes
16 82.50 1790 6 1.7 yes
17 86.25 2010 6 1.2 yes
18 87.50 2000 6 0.0 yes
19 88.00 2100 8 2.3 yes
20 92.00 2240 7 0.7 yes
> PriceSqFt<-a$Price/a$FloorArea
> PriceSqFt
[1] 0.04244898 0.04451220 0.04791667 0.05750000 0.04207746 0.04310345
0.04692029 0.04453642 0.04821429 0.04500000
[11] 0.04069767 0.04441176 0.04668675 0.04333333 0.04439891 0.04608939
0.04291045 0.04375000 0.04190476 0.04107143

b)
>a<-cbind(a,PriceSqFt)
>a
Price FloorArea Rooms Age CentralHeating PriceSqFt
1 52.00 1225 3 6.2 no 0.04244898
2 54.75 1230 3 7.5 no 0.04451220
3 57.50 1200 3 4.2 no 0.04791667
4 57.50 1000 2 8.8 no 0.05750000
5 59.75 1420 4 1.9 yes 0.04207746
6 62.50 1450 3 5.2 no 0.04310345
7 64.75 1380 4 6.6 yes 0.04692029
8 67.25 1510 4 2.3 no 0.04453642
9 67.50 1400 5 6.1 no 0.04821429
10 69.75 1550 6 9.2 no 0.04500000
11 70.00 1720 6 4.3 yes 0.04069767
12 75.50 1700 5 4.3 no 0.04441176
13 77.50 1660 6 1.0 yes 0.04668675
14 78.00 1800 7 7.0 yes 0.04333333
15 81.25 1830 6 3.6 yes 0.04439891
16 82.50 1790 6 1.7 yes 0.04608939
17 86.25 2010 6 1.2 yes 0.04291045
18 87.50 2000 6 0.0 yes 0.04375000
19 88.00 2100 8 2.3 yes 0.04190476
20 92.00 2240 7 0.7 yes 0.04107143

c)
> write.table(a,'HouseInfo.txt')

d)
> dir()
[1] "desktop.ini"
[2] "house_data_1.csv"
[3] "HouseInfo.txt"
[4] "lab_22dec_2018.txt"
[5] "lab_29dec_2018.txt"
[6] "WIN(2018-19)_MAT1004_ELA_G04_AP2018195000032_Reference
Material I_Hands on exercise on R Day 3.pdf"
> read.table('HouseInfo.txt')
Price FloorArea Rooms Age CentralHeating PriceSqFt
1 52.00 1225 3 6.2 no 0.04244898
2 54.75 1230 3 7.5 no 0.04451220
3 57.50 1200 3 4.2 no 0.04791667
4 57.50 1000 2 8.8 no 0.05750000
5 59.75 1420 4 1.9 yes 0.04207746
6 62.50 1450 3 5.2 no 0.04310345
7 64.75 1380 4 6.6 yes 0.04692029
8 67.25 1510 4 2.3 no 0.04453642
9 67.50 1400 5 6.1 no 0.04821429
10 69.75 1550 6 9.2 no 0.04500000
11 70.00 1720 6 4.3 yes 0.04069767
12 75.50 1700 5 4.3 no 0.04441176
13 77.50 1660 6 1.0 yes 0.04668675
14 78.00 1800 7 7.0 yes 0.04333333
15 81.25 1830 6 3.6 yes 0.04439891
16 82.50 1790 6 1.7 yes 0.04608939
17 86.25 2010 6 1.2 yes 0.04291045
18 87.50 2000 6 0.0 yes 0.04375000
19 88.00 2100 8 2.3 yes 0.04190476
20 92.00 2240 7 0.7 yes 0.04107143

> write.table(a,'HouseInfo.txt',sep='\t')

> read.table('HouseInfo.txt')
Price FloorArea Rooms Age CentralHeating PriceSqFt

1 52.00 1225 3 6.2 no 0.04244898


2 54.75 1230 3 7.5 no 0.04451220
3 57.50 1200 3 4.2 no 0.04791667
4 57.50 1000 2 8.8 no 0.05750000
5 59.75 1420 4 1.9 yes 0.04207746
6 62.50 1450 3 5.2 no 0.04310345
7 64.75 1380 4 6.6 yes 0.04692029
8 67.25 1510 4 2.3 no 0.04453642
9 67.50 1400 5 6.1 no 0.04821429
10 69.75 1550 6 9.2 no 0.04500000
11 70.00 1720 6 4.3 yes 0.04069767
12 75.50 1700 5 4.3 no 0.04441176
13 77.50 1660 6 1.0 yes 0.04668675
14 78.00 1800 7 7.0 yes 0.04333333
15 81.25 1830 6 3.6 yes 0.04439891
16 82.50 1790 6 1.7 yes 0.04608939
17 86.25 2010 6 1.2 yes 0.04291045
18 87.50 2000 6 0.0 yes 0.04375000
19 88.00 2100 8 2.3 yes 0.04190476
20 92.00 2240 7 0.7 yes 0.04107143
> read.delim('HouseInfo.txt')
Price FloorArea Rooms Age CentralHeating PriceSqFt
1 52.00 1225 3 6.2 no 0.04244898
2 54.75 1230 3 7.5 no 0.04451220
3 57.50 1200 3 4.2 no 0.04791667
4 57.50 1000 2 8.8 no 0.05750000
5 59.75 1420 4 1.9 yes 0.04207746
6 62.50 1450 3 5.2 no 0.04310345
7 64.75 1380 4 6.6 yes 0.04692029
8 67.25 1510 4 2.3 no 0.04453642
9 67.50 1400 5 6.1 no 0.04821429
10 69.75 1550 6 9.2 no 0.04500000
11 70.00 1720 6 4.3 yes 0.04069767
12 75.50 1700 5 4.3 no 0.04441176
13 77.50 1660 6 1.0 yes 0.04668675
14 78.00 1800 7 7.0 yes 0.04333333
15 81.25 1830 6 3.6 yes 0.04439891
16 82.50 1790 6 1.7 yes 0.04608939
17 86.25 2010 6 1.2 yes 0.04291045
18 87.50 2000 6 0.0 yes 0.04375000

19 88.00 2100 8 2.3 yes 0.04190476


20 92.00 2240 7 0.7 yes 0.04107143
2. Matrices and arrays
a) Matrices and arrays are represented as vectors with
dimensions: Create one matrix x with 1 to 12 numbers with 3X4
order.
b) Create same matrix with matrix function.
c) Give name of rows of this matrix with A,B,C.
d) Transpose of the matrix.
e) Use functions cbind and rbind separately to create different matrices.
f) Use arbitrary numbers to create matrix.
g) Verify matrix multiplication.

2.a)
> dim(m)<-c(3,4)
>m
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12

b)
> m<-matrix(c(1:12),nrow=3,ncol=4,dimnames=list(rownames,colnames))
>m
col1 col2 col3 col4
row1 1 4 7 10
row2 2 5 8 11
row3 3 6 9 12
> matrix(c(1:12),3,4)
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12

c)
> m<-matrix(c(1:12),3,4,dimnames=list(c("A", "B", "C"),c("P", "Q", "R","S")))
>m
PQRS

A 1 4 7 10
B 2 5 8 11
C 3 6 9 12

d)
>m
PQRS
A 1 4 7 10
B 2 5 8 11
C 3 6 9 12
> t(m)
ABC
P123
Q456
R789
S 10 11 12

e)
> rbind(A=c(1,2,3,4),B=c(5,6,7,8),C=c(9,10,11,12))
[,1] [,2] [,3] [,4]
A 1 2 3 4
B 5 6 7 8
C 9 10 11 12
> cbind(P=c(1,5,9),Q=c(2,6,10),R=c(3,7,11),S=c(4,8,12))
PQRS
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12

f)
w<-t(m)
>w
ABC
P123
Q456
R789
S 10 11 12

g)
> m%*%w

A B C
A 166 188 210
B 188 214 240
C 210 240 270
RANDOM SAMPLING, PROBABILITY AND
CHOOSE FUNCTION

3. Random sampling
a) In R, you can simulate these situations with the sample function. Pick five numbers
at random from the set 1:40.

b) Notice that the default behavior of sample is sampling without replacement. That is, the
samples will not contain the same number twice, and size obviously cannot be bigger than
the length of the vector to be sampled. If you want sampling with replacement, then you
need to add the argument replace=TRUE.

Sampling with replacement is suitable for modelling coin tosses or throws of a die. So,
for instance, simulate 10-coin tosses.

c) In fair coin-tossing, the probability of heads should equal the probability of tails, but the
idea of a random event is not restricted to symmetric cases. It could be equally well applied
to other cases, such as the successful outcome of a surgical procedure. Hopefully, there
would be a better than 50% chance of this. Simulate data with nonequal probabilities for the
outcomes (say, a 90% chance of success) by using the prob argument to sample.

d) The choose function can be used to calculate the following express.

e) Find 5!

3.a)
> x<-1:40

>x
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36 37
[38] 38 39 40
> sample(x)
[1] 18 30 10 36 2 15 23 35 5 29 16 38 17 24 7 21 31 14 26 6 11 3 33 37 28 22 32
1 19 12 8 27 39 13 40 9 25 38 4 34 20
> sample(x,5)
[1] 38 10 24 32 20
b)

> sample(x,5,replace=T
) [1] 20 14 26 26 29

>>
sample(x,5,replace=T)
[1] 35 32 22 1 6

> sample(x,5,replace=T
) [1] 20 14 26 26 29

> sample(x,5,replace=T
) [1] 7 7 15 3 16
> sample(x,5,replace=T
) [1] 22 14 38 38 39

> sample(x,5,replace=T
) [1] 38 28 3 19 36

> sample(x,5,replace=T
) [1] 14 14 40 27 24

> sample(x,5,replace=T
) [1] 30 9 33 14 27

> sample(x,5,replace=T
) [1] 3 12 27 22 27

> sample(x,5,replace=T
) [1] 32 39 18 1 13

> sample(x,5,replace=T
) [1] 38 6 1 16 4
c)
> sample(c("H","T"),10,replace=T)
[1] "H" "H" "H" "T" "H" "T" "H" "T" "H" "T"
> sample(c("H","T"),10,replace=T)
[1] "T" "T" "H" "T" "H" "H" "T" "T" "H" "T"
> sample(c("H","T"),10,replace=T)
[1] "T" "T" "H" "H" "T" "H" "T" "H" "H" "T"
> sample(c("H","T"),10,replace=T)
[1] "T" "T" "T" "H" "H" "H" "T" "H" "H" "H"
> sample(c("H","T"),10,replace=T)
[1] "H" "H" "T" "T" "H" "T" "H" "T" "H" "H"
> sample(c("H","T"),10,replace=T)
[1] "H" "T" "H" "T" "T" "T" "T" "T" "H" "H"
> sample(c("H","T"),10,replace=T)
[1] "H" "H" "T" "H" "H" "T" "T" "T" "H" "T"

> sample(c("H","T"),10,replace=T)
[1] "H" "T" "H" "H" "T" "H" "T" "H" "H" "H"
> sample(c("H","T"),10,replace=T)
[1] "H" "T" "T" "H" "H" "H" "H" "H" "T" "H"
> sample(c("H","T"),10,replace=T)
[1] "T" "T" "T" "H" "T" "H" "T" "H" "H" "H"
> sample(c("H","T"),10,replace=T)
[1] "T" "H" "T" "T" "T" "H" "T" "H" "T" "T"
>>sample(c("H","T"),10,replace=T,prob=c(90,10
))
[1] "H" "T" "H" "H" "H" "H" "H" "H" "H"
"H" d) choose(40,5)
[1] 658008
> factorial(40)/(factorial(5)*factorial(35))
[1] 658008

e) factorial(5
) [1] 120
BINOMIAL DISTRIBUTION

> dbinom(3,5,0.95)
[1] 0.02143438

> dbinom(2,5,0.95)+dbinom(3,5,0.95)+dbinom(4,5,0.95)
[1] 0.2261891
> dbinom(c(0,1,2,3,4,5),5,0.95)
[1] 0.0000003125 0.0000296875 0.0011281250 0.0214343750 0.2036265625
0.7737809375
> dbinom(0:4,5,0.95)

[1] 0.0000003125 0.0000296875 0.0011281250 0.0214343750 0.2036265625


> dbinom(2:4,5,0.95)

[1] 0.001128125 0.021434375 0.203626563


> sum(dbinom(2:4,5,0.95))
[1] 0.2261891
> > pbinom(3,5,0.95)
[1] 0.0225925
> pbinom(2:4,5,0.95)
[1] 0.001158125 0.022592500 0.226219063

> sum(pbinom(2:4,5,0.95))
[1] 0.2499697

>>
sum(pbinom(2:4,5,0.95)) [1]
0.2499697
> sum(dbinom(2:4,5,0.95))
[1] 0.2261891
> pbinom(2:4,5,0.95)

[1] 0.001158125 0.022592500 0.226219063


> dbinom(2:4,5,0.95)

[1] 0.001128125 0.021434375 0.203626563


> > pbinom(4,5,0.95)-pbinom(1,5,0.95)
[1] 0.2261891
> > plot(x,prob,type="h")
> > plot(x,prob,type="h",main='Binomial Distribution')
> > plot(x,prob,type="h",main='Binomial
Distribution',xlab='No.of.ready terminals(x)',ylab='p(x)')
PNORM, QNORM AND RNORM FUNCTIONS

IQ's are normally distributed with mean of 100 and S.D of 15


1)what percentage of people have I
[1] 0.9522096
Q less than 125
> pnorm(125,100,15,lower.tail=T)

2)what percentage of people have IQ greater than 110


> pnorm(110,100,15,lower.tail=F)
[1] 0.2524925

3)what percentage of people have IQ in between 110 and 125

> pnorm(125,100,15,lower.tail=T)-
pnorm(110,100,15,lower.tail=F) [1] 0.6997171

4)Find 25 percentage for standard normal distribution


> qnorm(0.25)
[1] -0.6744898
Find 25 percentage for standard normal distribution with mean of 2 and S.D of 3
> qnorm(.25,2,3)
[1] -0.02346925

5)what IQ seperates the lower 25% from the others (mean=100 and S.D=15)
> qnorm(0.25,100,50,T)
[1] 66.27551
6)what IQ seperates the top 10% from the others (mean=100 and S.D=15)
> qnorm(0.10,100,50,F)
[1] 164.0776

*Generating random numbers from a normal distribution with mean 572 and sd
51
> rnorm(n=20, mean=572, sd=51)

[1] 513.7724 651.8332 508.8003 561.2989 589.0541 510.5020 579.1214 595.0203


629.1108 543.1042 523.8742 620.4255 583.5188 461.0772 557.2757 477.0051
561.9530 498.8606 505.4049 556.4619
> b<-rnorm(n=20, mean=572, sd=51)
>b
[1] 547.4588 650.9067 626.0564 617.3190 527.6772 548.8507 570.9685
599.9386 581.5112 630.2935 593.9666 535.7267 529.0297 627.5354 632.6571
483.1399 577.9614 497.0527 624.7555 541.8844
> mean(b)
[1] 577.2345
> sd(b)
[1] 49.02958
HISTOGRAM
>hist(b,freq=F)
>curve(dnorm(x,572,51),add=T)
> hist(b,main="Normal Distribution",freq=F)
> curve(dnorm(x,mean(b),sd(b)),add=T)
TESTOF HYPOTHESIS: Z-TEST

Test the hypothesis that the mean systolic blood pressure in a certain
population
equals 140 mmHg. The standard deviation has a known value of 20 and a data
set
of 55 patients is available.
120,115,94,118,111,102,102,131,104,107,115,139,115,113,114,105,115,134,109,
109,93,118,109,106,125,150,142,119,127,141,149,144,142,149,161,143,140,148,
149,141,146,159,152,135,134,161,130,125,141,148,153,145,137,147,169
Sol:
x<-c(120,115,94,118,111,102,102,131,104,107,115,139,115,113,114,105,115,134
109,109,93,118,109,106,125,150,142,119,127,141,149,144,142,149,161,143,140,1
48,149,141,146,159,152,135,134,161,130,125,141,148,153,145,137,147,169)
>x
[1] 120 115 94 118 111 102 102 131 104 107 115 139 115 113 114 105 115 134
109 109 93 118 109 106 125 150 142 119 127 141 149 144 142 149 161 143 140
148 149 141 146 159 152 135 134 161 130 125 141 148 153 145 137 147 169
> length(x)
[1] 55

> mean(x)
[1] 130
> sd(x)

[1] 19.16691
> n<-
length(x) > n
[1] 55

> z<-sqrt(n)*(mean(x)-
140)/sd(x) > z
[1] -3.869272
> p<-
2*pnorm(abs(z)) > p
[1] 1.999891
> 1-

pnorm(z)
[1] 0.9999454

> pnorm(z)

[1] 5.458036e-05
> 2*(1-pnorm(z))
[1] 1.999891
> pnorm(z)

[1] 5.458036e-05
> pnorm(1-3.86)
[1] 0.002118205
> 2*(1-pnorm(3.86))
[1] 0.000113387
> prop.test(43,100,p=0.5,conf.level=0.95)

1-sample proportions test with continuity correction


data: 43 out of 100, null probability 0.5
X-squared = 1.69, df = 1, p-value = 0.1936
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:

0.3326536 0.5327873
sample estimates:
p
0.43
TEST OF HYPOTHESIS: T-TEST
1. An outbreak of salmonella-related illness was attributed to ice produced at a
certain factory. Scientists measured the level of Salmonella in 9 randomly
sampled batches ice cream .The levels (in MPN/g) were: 0.593 0.142 0.329
0.691,0.231 0.793 0.519 0.392 0.418 Is there evidence that the mean level pf
Salmonella in ice cream greater than 0.3 MPN/g?

Sol. Let mu be the mean of population


Null hypothesis is H0: mu=0.3
Alternative Hypothesis is H1:mu>0.3 [Right tail test]
Level Of Significance is 5%
Compute t value
CODE:

> x<-c(0.593,0.142,0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418)


>x

[1] 0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418
> t.test(x,alternative="greater",mu=0.3)
One Sample t-test
data: x
t = 2.2051, df = 8, p-value = 0.02927
alternative hypothesis: true mean is greater than 0.3
95 percent confidence interval:
0.3245133 Inf
sample estimates:
mean of x
0.4564444
Conclusion:
t = 2.2051, p-value = 0.02927
We got | t |= 2.2051>p-value, we reject the null hypothesis.

We got the result that true mean is greater than 0.3. so we reject the null
hypothesis. We conclude that there evidence that the mean level Salmonella in
ice cream greater than 0.3 MPN

2. Suppose that 10 volunteers have taken an intelligence test; here are the
results obtained. The average score of the entire population is 75 in the same
test. Is there any significant difference (with a significance level of 95%)
between the sample and population means, assuming that the variance of the
population is not known.

Scores: 65, 78, 88, 55, 48, 95, 66, 57, 79, 81
Sol: Let mu be the mean of population
Null hypothesis is H0: mu=75
Alternative Hypothesis is H1:mu1=75 [Two tail test]
Level Of Significance is 5%
Compute t value
> x<-c(65, 78, 88, 55, 48, 95, 66, 57, 79, 81)
> t.test(x,mu=75)
One Sample t-test
data: x
t = -0.78303, df = 9, p-value = 0.4537
alternative hypothesis: true mean is not equal to 75
95 percent confidence interval:
60.22187 82.17813
sample estimates:
mean of x
71.2
Conclusion:
P-value =0.4537 and t=-0.78303
|t|=0.78303>P-value, we reject null hypothesis
Hence there is a significant difference between the sample and population means

3. Comparing two independent sample means, taken from two populations


with unknown variance. The following data shows the heights of individuals of
two different countries with unknown population variances. Is there any
significant difference b/n the average heights of two groups.

A: 175 168 168 190 156 181 182 175 174 179
B: 185 169 173 173 188 186 175 174 179 180

Sol: It is an independent two group test. We test the claim using two tail t test
Let mu1 and mu2 be the mean of groups A and B respectively.
Null hypothesis: H0:mu1-mu2 i.e. no significant difference
Alternative Hypothesis: H1:mu1=mu2 (there is significant difference)
LOS=5%

CODE-
> A<-c(175 ,168 ,168, 190, 156, 181, 182, 175, 174, 179)
> B<-c(185 ,169, 173, 173, 188, 186, 175, 174, 179, 180 )

> t.test(A,B)
Welch Two Sample t-test
Data: A and B
t = -0.94737, df = 15.981, p-value = 0.3576
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.008795 4.208795
sample estimates:
mean of x mean of y
174.8 178.2

We got p-value=0.3576
Conclusion:
t = -0.94737, p-value = 0.3576
| t |=0.94737>p-value, we reject null hypothesis

We got the result that true difference in means is not equal to 0 and therefore reject
the null hypothesis. So there is a significant difference between the average heights
of two groups
LINEAR REGRESSION AND CORRELATION
> x<-c(132,129,120,113.2,105,92,84,83.2,88.4,59,80,81.5,71,69.2)
> length(x)
[1] 14
> y<-c(46,48,51,52.1,54,52,59,58.7,61.6,64,61.4,54.6,58.8,58)
> length(y)
[1] 14
> lm(x~y)
Call:

lm(formula = x ~
y) Coefficients:
(Intercept)y
303.584 -3.777
> plot(x,y,col='red',pch=15)
> abline(lm(y~x))
> cars speed
dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
7 10 18
8 10 26
9 10 34
10 11 17
11 11 28
12 12 14
13 12 20
14 12 24
15 12 28
16 13 26
17 13 34
18 13 34
19 13 46
20 14 26
21 14 36
22 14 60
23 14 80
24 15 20
25 15 26
26 15 54
27 16 32
28 16 40
29 17 32
30 17 40
31 17 50
32 18 42
33 18 56
34 18 76
35 18 84
36 19 36
37 19 46
38 19 68
39 20 32
40 20 48
41 20 52
42 20 56
43 20 64
44 22 66
45 23 54
46 24 70
47 24 92
48 24 93
49 24 120
50 25 85
> scatter.smooth(x=cars$speed, y=cars$dist, main="Dist ~ Speed")
> x=cars$speed

>x
[1] 4 4 7 7 8 9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15 15 16
16 17 17 17 18 18 18 18 19 19 19 20
[40] 20 20 20 20 22 23 24 24 24 24 25

> y=cars$dist
>y
[1] 2 10 4 22 16 10 18 26 34 17 28 14 20 24 28 26 34 34 46 26 36 60
80 20 26 54 32 40 32
[30] 40 50 42 56 76 84 36 46 68 32 48 52 56 64 66 54 70 92 93 120 85
> plot(x,y,col='red',pch=3)
> abline(lm(y~x))
Correlation

> cor(cars$speed, cars$dist) [1]


0.8068949
> linearMod <- lm(dist ~ speed, data=cars)
> print(linearMod)
Call:

lm(formula = dist ~ speed, data = cars)

Coefficients:
(Intercept) speed
-17.579 3.932
> lm(x~y)
Call:

lm(formula = x ~ y)
Coefficients:
(Intercept)y
8.2839 0.1656

> lm(y~x)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept) x
-17.579 3.932
44 | P a g e

You might also like