50% found this document useful (2 votes)

375 views44 pages

Stastistics and Probability With R Programming Language: Lab Report

This document provides a lab report on statistics and probability using R programming. It includes examples of simple operations in R like finding minimum, maximum, and length of a data list. It also demonstrates reading a data file, performing operations on the data like calculating summary statistics, plotting graphs, and performing hypothesis tests. Functions used include read.csv, mean, median, sd, plot, hist, and boxplot. The document contains examples of importing data, accessing and manipulating it to calculate measures and visualize the results.

Uploaded by

Ayush Anand Sagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

50% found this document useful (2 votes)

375 views44 pages

Stastistics and Probability With R Programming Language: Lab Report

Uploaded by

Ayush Anand Sagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 44

STASTISTICS AND PROBABILITY WITH R

PROGRAMMING LANGUAGE
Course Code – MAT1010

LAB REPORT

Under the guidance of : Prof Dr. SANTANU MANDAL

Done by: AYUSH ANAND SAGAR(17BES7003)

ECE-EMBEDDED SYSTEM
INDEX

Introduction to R programming and Simple Operations…….. 3

Operations on Data files………………………………….….. 7
Matrix operations on Data files…………………………….... 19
Random Sampling, Probability and Choose functions……….25
Binomial Distribution………………………………………….29
Pnorm, Qnorm and Rnorm functions………………………….31
Histogram…………………………………………………… 33
Test of Hypothesis: Z-Test…………………………………… 34
Test of Hypothesis: T-Test………………………………….. 36
Linear Regression and Correlation…………………………… 40
INTRODUCTION TO R PROGRAMMING AND
SIMPLE OPERATIONS IN R
1. Simple Operations
a) Enter the data {2,5,3,7,1,9,6} directly and store it in a variable x.
b) Find the number of elements in x, i.e. in the data list.
c) Find the last element of x.
d) Find the minimum element of x.
e) Find the maximum element of x.

1.
a) x<-c(2,5,3,7,1,9,6)
x-2 5 3 7 1 9 6
b) length(x)
7
c) x[length(x)]
6
d) min(x)
1
e) max(x)
9

2. Enter the data {1, 2, …. ,19,20} in a variable x

a) Find the 3rd element in the data list.
b) Find 3rd to 5th element in the data list.
c) Find 2nd, 5th, 6th, and 12th element in the list.
d) Print the data as {20, 19, …, 2, 1} without again entering the data.
2.
a) x<-c(1:20)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
b) x[3:5]
345
c) x[c(2,5,6,12)]
2 5 6 12
d) rev(x) or x<-c(20:1)
20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

3.
a) Create a data list (4, 4, 4, 4, 3, 3, 3, 5, 5, 5) using ‘rep’ function.
b) Create a list (4, 6, 3, 4, 6, 3, …, 4, 6, 3) where there 10 occurrences of 4, 6, and
3 in the given order.
c) Create a list (3, 1, 5, 3, 2, 3, 4, 5, 7,7, 7, 7, 7,7, 6, 5, 4, 3, 2, 1, 34, 21, 54) using
one-line command.
d) First create a list (2, 1, 3, 4). Then append this list at the end with another list (5, 7,
12, 6, -8). Check whether the number of elements in the augmented list is 11.

a) x<-c(rep(4,4),rep(3,3),rep(5,3))

[1] 4 4 4 4 3 3 3 5 5 5

b) x<-

c(rep(c(4,6,3),10))

[1] 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3
c)

x<-c(3,1,5,3,c(2:5),rep(7,6),c(6:1),34,21,54)

[1] 3 1 5 3 2 3 4 5 7 7 7 7 7 7 6 5 4 3 2 1 34 21 54

> x<-c(2,1,3,4)

> y<-c(5,7,12,6,-8)

> append(x,y)

[1] 2 1 3 4 5 7 12 6 -8 >

length(append(x,y))==11

[1] FALSE
4.
(a) Print all numbers starting with 3 and ending with 7 with an increment
of 0:5. Store these numbers in x.
(b) Print all even numbers between 2 and 14 (both inclusive)
(a) Type 2*x and see what you get. Each element of x is multiplied by 2.

a) x<-seq(3,7,0:5)

[1] 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0

> seq(2,14,2)

[1] 2 4 6 8 10 12 14

c) > x*2 or x^2

[1] 6 7 8 9 10 11 12 13 14

5. Few simple statistical measures:

(a) Enter data as 1,2, … ,10.
(b) Find sum of the numbers.
(c) Find mean, median.
(d) Find sum of squares of these values.
(e) Find the value of 1 Σ| − | =1, This is known as mean deviation about mean ( ).

(f) Check whetheris less than or equal to standard deviation

5.
a) x<-c(1:10)
b) sum(x)
55
c) mean(x)
5.5
median(x)
5.5
d) sum(x^2)
385
e) md<-sum(abs(x-mean(x)))/length(x)
>md
2.5

f) md<=sd(x)
TRUE
OPERATIONS ON DATA FILES

1. Few simple statistical

measures: a) Enter data as 1,2, …
,10.
b) Find sum of the
numbers. c) Find mean,
median.
d) Find sum of squares of these values.
e) Find the value of 1 Σ| − | =1, This is known as mean deviation about mean ( ).

f) Check whether is less than or equal to standard deviation. g)

Find standard deviation using formula.

1.
a) x<-c(1:10)
b) sum(x)
55
c) mean(x)
5.5
median(x)
5.5
d) sum(x^2)
385
e) md<-sum(abs(x-mean(x)))/length(x)
>md
2.5
f) md<=sd(x)

g) sd(x)
3.02765
2. Reading a data file and working with it: a)
Read the file first and store it in a.
b) How many rows are there in this table? How many columns are there?
c) How to find the number of rows and number of columns by a single command? d) What
are the variables in the data file?
e) If the file is very large, naturally we cannot simply type `a', because it will cover the entire
screen and we won't be able to understand anything. So how to see the top or bottom few
lines in this file?
f) If the number of columns is too large, again we may face the same problem. So how to
see the first 5 rows and first 3 columns?
g) How to get 1st, 3rd, 6th, and 10th row and 2nd, 4th, and 5th column?
h) How to get values in a specific row or a column?

2.
a)
a<-read.csv('house_data_1.csv')
>a
Price FloorArea Rooms Age CentralHeating
1 52.00 1225 3 6.2 No
2 54.75 1230 3 7.5 No
3 57.50 1200 3 4.2 No
4 57.50 1000 2 8.8 No
5 59.75 1420 4 1.9 Yes
6 62.50 1450 3 5.2 No
7 64.75 1380 4 6.6 Yes
8 67.25 1510 4 2.3 No
9 67.50 1400 5 6.1 No
10 69.75 1550 6 9.2 no
11 70.00 1720 6 4.3 yes
12 75.50 1700 5 4.3 no
13 77.50 1660 6 1.0 yes
14 78.00 1800 7 7.0 yes
15 81.25 1830 6 3.6 yes
16 82.50 1790 6 1.7 yes
17 86.25 2010 6 1.2 yes
18 87.50 2000 6 0.0 yes
19 88.00 2100 8 2.3 yes
20 92.00 2240 7 0.7 yes
b)
> nrow(a)
[1] 20
> ncol(a)
[1] 5

c)
> c(nrow(a),ncol(a)) or dim(a)
[1] 20 5

d)
> names(a)
[1] "Price" "FloorArea" "Rooms" "Age" "CentralHeating"

e)
> head(a)
Price FloorArea Rooms Age CentralHeating
1 52.00 1225 3 6.2 No
2 54.75 1230 3 7.5 No
3 57.50 1200 3 4.2 No
4 57.50 1000 2 8.8 No
5 59.75 1420 4 1.9 Yes
6 62.50 1450 3 5.2 No

> tail(a)
Price FloorArea Rooms Age CentralHeating
15 81.25 1830 6 3.6 Yes
16 82.50 1790 6 1.7 Yes
17 86.25 2010 6 1.2 Yes
18 87.50 2000 6 0.0 Yes
19 88.00 2100 8 2.3 Yes
20 92.00 2240 7 0.7 Yes

f)
> a[1:5,1:3]
Price FloorArea Rooms
1 52.00 1225 3
2 54.75 1230 3
3 57.50 1200 3
4 57.50 1000 2
5 59.75 1420 4

g)
> a[c(1,3,6,10),c(2,4,5)]
FloorArea Age CentralHeating
1 1225 6.2 no
3 1200 4.2 no
6 1450 5.2 no
10 1550 9.2 no

h)
> a[5]
CentralHeating
1 no
2 no
3 no
4 no
5 yes
6 no
7 yes
8 no
9 no
10 no
11 yes
12 no
13 yes
14 yes
15 yes
16 yes
17 yes
18 yes
19 yes
20 yes
> a[2,]
Price FloorArea Rooms Age CentralHeating
2 54.75 1230 3 7.5 no

>a
Price FloorArea Rooms Age CentralHeating
1 52.00 1225 3 6.2 no
2 54.75 1230 3 7.5 no
3 57.50 1200 3 4.2 no
4 57.50 1000 2 8.8 no
5 59.75 1420 4 1.9 yes
6 62.50 1450 3 5.2 no
7 64.75 1380 4 6.6 yes
8 67.25 1510 4 2.3 no
9 67.50 1400 5 6.1 no
10 69.75 1550 6 9.2 no
11 70.00 1720 6 4.3 yes
12 75.50 1700 5 4.3 no
13 77.50 1660 6 1.0 yes
14 78.00 1800 7 7.0 yes
15 81.25 1830 6 3.6 yes
16 82.50 1790 6 1.7 yes
17 86.25 2010 6 1.2 yes
18 87.50 2000 6 0.0 yes
19 88.00 2100 8 2.3 yes
20 92.00 2240 7 0.7 yes

3. Calculate simple statistical measures using the values in the data file.
a) Find means, medians, standard deviations of Price, Floor Area, Rooms,
and Age.
b) How many houses have central heating and how many don't have?
c) Plot Price vs. Floor, Price vs. Age, and Price vs. rooms, in separate graphs.
d) Draw histograms of Prices, FloorArea, and Age.
e) Draw box plots of Price, FloorArea, and Age.
f) Draw all the graphs in (c), (d), and (e) in the same graph paper.

3)
a)
> mean(a[,1])
[1] 71.5875
> mean(a[,2])
[1] 1610.75
> mean(a[,3])
[1] 5
> mean(a[,4])
>
> [1] 4.205
> median(a[,1])
[1] 69.875
> median(a[,2])
[1] 1605
> median(a[,3])
[1] 5.5
> median(a[,4])
[1] 4.25
> sd(a[,1])
[1] 12.21094
> sd(a[,2])
[1] 331.9649
> sd(a[,3])
[1] 1.65434
> sd(a[,4])
[1] 2.786523

Alternatively

> names(a)
[1] "Price" "FloorArea" "Rooms" "Age" "CentralHeating"
> mean(a$Price)
[1] 71.5875
> mean(a$FloorArea)
[1] 1610.75
> mean(a$Rooms)
[1] 5
> mean(a$Age)
[1] 4.205

b)
> sum(a$CentralHeating=="yes")
[1] 11
> sum(a$CentralHeating=="no")
> [1] 9
>
>
> c)
>
>
> >plot(a$Price,a$Floor)
>
> plot(a$Price,a$Age)

> plot(a$Price,a$Rooms)
d)
> hist(a$Price,freq=F)

> hist(a$FloorArea,freq=F)
> hist(a$Age,freq=F)

e)
> boxplot(a$Price)
> boxplot(a$FloorArea)

> boxplot(a$Age)

f)
> plot(a$Price,a$Floor)
> plot(a$Price,a$Age)
> plot(a$Price,a$Rooms)
> hist(a$Price,freq=F)
> hist(a$FloorArea,freq=F)
> hist(a$Age,freq=F)
> boxplot(a$Price)
> boxplot(a$FloorArea)
> boxplot(a$Age)
MATRIX OPERATIONS ON DATA FILES

1. Augmenting the file and saving the resultant file:

a) Calculate the value per square foot area of each apartment and store it in a
vector named “PriceSqFt”.
b) Place this vector after the last column in the data file.
c) Save the augmented file under name “HouseInfo.txt”.
d) Read the file "HouseInfo.txt".

1)
a)
>a<-read.csv('house_data_1.csv')
>a
Price FloorArea Rooms Age CentralHeating
1 52.00 1225 3 6.2 No
2 54.75 1230 3 7.5 No
3 57.50 1200 3 4.2 No
4 57.50 1000 2 8.8 No
5 59.75 1420 4 1.9 Yes
6 62.50 1450 3 5.2 No
7 64.75 1380 4 6.6 Yes
8 67.25 1510 4 2.3 No
9 67.50 1400 5 6.1 No
10 69.75 1550 6 9.2 no
11 70.00 1720 6 4.3 yes
12 75.50 1700 5 4.3 no
13 77.50 1660 6 1.0 yes
14 78.00 1800 7 7.0 yes
15 81.25 1830 6 3.6 yes
16 82.50 1790 6 1.7 yes
17 86.25 2010 6 1.2 yes
18 87.50 2000 6 0.0 yes
19 88.00 2100 8 2.3 yes
20 92.00 2240 7 0.7 yes
> PriceSqFt<-a$Price/a$FloorArea
> PriceSqFt
[1] 0.04244898 0.04451220 0.04791667 0.05750000 0.04207746 0.04310345
0.04692029 0.04453642 0.04821429 0.04500000
[11] 0.04069767 0.04441176 0.04668675 0.04333333 0.04439891 0.04608939
0.04291045 0.04375000 0.04190476 0.04107143

b)
>a<-cbind(a,PriceSqFt)
>a
Price FloorArea Rooms Age CentralHeating PriceSqFt
1 52.00 1225 3 6.2 no 0.04244898
2 54.75 1230 3 7.5 no 0.04451220
3 57.50 1200 3 4.2 no 0.04791667
4 57.50 1000 2 8.8 no 0.05750000
5 59.75 1420 4 1.9 yes 0.04207746
6 62.50 1450 3 5.2 no 0.04310345
7 64.75 1380 4 6.6 yes 0.04692029
8 67.25 1510 4 2.3 no 0.04453642
9 67.50 1400 5 6.1 no 0.04821429
10 69.75 1550 6 9.2 no 0.04500000
11 70.00 1720 6 4.3 yes 0.04069767
12 75.50 1700 5 4.3 no 0.04441176
13 77.50 1660 6 1.0 yes 0.04668675
14 78.00 1800 7 7.0 yes 0.04333333
15 81.25 1830 6 3.6 yes 0.04439891
16 82.50 1790 6 1.7 yes 0.04608939
17 86.25 2010 6 1.2 yes 0.04291045
18 87.50 2000 6 0.0 yes 0.04375000
19 88.00 2100 8 2.3 yes 0.04190476
20 92.00 2240 7 0.7 yes 0.04107143

c)
> write.table(a,'HouseInfo.txt')

d)
> dir()
[1] "desktop.ini"
[2] "house_data_1.csv"
[3] "HouseInfo.txt"
[4] "lab_22dec_2018.txt"
[5] "lab_29dec_2018.txt"
[6] "WIN(2018-19)_MAT1004_ELA_G04_AP2018195000032_Reference
Material I_Hands on exercise on R Day 3.pdf"
> read.table('HouseInfo.txt')
Price FloorArea Rooms Age CentralHeating PriceSqFt
1 52.00 1225 3 6.2 no 0.04244898
2 54.75 1230 3 7.5 no 0.04451220
3 57.50 1200 3 4.2 no 0.04791667
4 57.50 1000 2 8.8 no 0.05750000
5 59.75 1420 4 1.9 yes 0.04207746
6 62.50 1450 3 5.2 no 0.04310345
7 64.75 1380 4 6.6 yes 0.04692029
8 67.25 1510 4 2.3 no 0.04453642
9 67.50 1400 5 6.1 no 0.04821429
10 69.75 1550 6 9.2 no 0.04500000
11 70.00 1720 6 4.3 yes 0.04069767
12 75.50 1700 5 4.3 no 0.04441176
13 77.50 1660 6 1.0 yes 0.04668675
14 78.00 1800 7 7.0 yes 0.04333333
15 81.25 1830 6 3.6 yes 0.04439891
16 82.50 1790 6 1.7 yes 0.04608939
17 86.25 2010 6 1.2 yes 0.04291045
18 87.50 2000 6 0.0 yes 0.04375000
19 88.00 2100 8 2.3 yes 0.04190476
20 92.00 2240 7 0.7 yes 0.04107143

> write.table(a,'HouseInfo.txt',sep='\t')

> read.table('HouseInfo.txt')
Price FloorArea Rooms Age CentralHeating PriceSqFt

1 52.00 1225 3 6.2 no 0.04244898

2 54.75 1230 3 7.5 no 0.04451220
3 57.50 1200 3 4.2 no 0.04791667
4 57.50 1000 2 8.8 no 0.05750000
5 59.75 1420 4 1.9 yes 0.04207746
6 62.50 1450 3 5.2 no 0.04310345
7 64.75 1380 4 6.6 yes 0.04692029
8 67.25 1510 4 2.3 no 0.04453642
9 67.50 1400 5 6.1 no 0.04821429
10 69.75 1550 6 9.2 no 0.04500000
11 70.00 1720 6 4.3 yes 0.04069767
12 75.50 1700 5 4.3 no 0.04441176
13 77.50 1660 6 1.0 yes 0.04668675
14 78.00 1800 7 7.0 yes 0.04333333
15 81.25 1830 6 3.6 yes 0.04439891
16 82.50 1790 6 1.7 yes 0.04608939
17 86.25 2010 6 1.2 yes 0.04291045
18 87.50 2000 6 0.0 yes 0.04375000
19 88.00 2100 8 2.3 yes 0.04190476
20 92.00 2240 7 0.7 yes 0.04107143
> read.delim('HouseInfo.txt')
Price FloorArea Rooms Age CentralHeating PriceSqFt
1 52.00 1225 3 6.2 no 0.04244898
2 54.75 1230 3 7.5 no 0.04451220
3 57.50 1200 3 4.2 no 0.04791667
4 57.50 1000 2 8.8 no 0.05750000
5 59.75 1420 4 1.9 yes 0.04207746
6 62.50 1450 3 5.2 no 0.04310345
7 64.75 1380 4 6.6 yes 0.04692029
8 67.25 1510 4 2.3 no 0.04453642
9 67.50 1400 5 6.1 no 0.04821429
10 69.75 1550 6 9.2 no 0.04500000
11 70.00 1720 6 4.3 yes 0.04069767
12 75.50 1700 5 4.3 no 0.04441176
13 77.50 1660 6 1.0 yes 0.04668675
14 78.00 1800 7 7.0 yes 0.04333333
15 81.25 1830 6 3.6 yes 0.04439891
16 82.50 1790 6 1.7 yes 0.04608939
17 86.25 2010 6 1.2 yes 0.04291045
18 87.50 2000 6 0.0 yes 0.04375000

19 88.00 2100 8 2.3 yes 0.04190476

20 92.00 2240 7 0.7 yes 0.04107143
2. Matrices and arrays
a) Matrices and arrays are represented as vectors with
dimensions: Create one matrix x with 1 to 12 numbers with 3X4
order.
b) Create same matrix with matrix function.
c) Give name of rows of this matrix with A,B,C.
d) Transpose of the matrix.
e) Use functions cbind and rbind separately to create different matrices.
f) Use arbitrary numbers to create matrix.
g) Verify matrix multiplication.

2.a)
> dim(m)<-c(3,4)
>m
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12

b)
> m<-matrix(c(1:12),nrow=3,ncol=4,dimnames=list(rownames,colnames))
>m
col1 col2 col3 col4
row1 1 4 7 10
row2 2 5 8 11
row3 3 6 9 12
> matrix(c(1:12),3,4)
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12

c)
> m<-matrix(c(1:12),3,4,dimnames=list(c("A", "B", "C"),c("P", "Q", "R","S")))
>m
PQRS

A 1 4 7 10
B 2 5 8 11
C 3 6 9 12

d)
>m
PQRS
A 1 4 7 10
B 2 5 8 11
C 3 6 9 12
> t(m)
ABC
P123
Q456
R789
S 10 11 12

e)
> rbind(A=c(1,2,3,4),B=c(5,6,7,8),C=c(9,10,11,12))
[,1] [,2] [,3] [,4]
A 1 2 3 4
B 5 6 7 8
C 9 10 11 12
> cbind(P=c(1,5,9),Q=c(2,6,10),R=c(3,7,11),S=c(4,8,12))
PQRS
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12

f)
w<-t(m)
>w
ABC
P123
Q456
R789
S 10 11 12

g)
> m%*%w

A B C
A 166 188 210
B 188 214 240
C 210 240 270
RANDOM SAMPLING, PROBABILITY AND
CHOOSE FUNCTION

3. Random sampling
a) In R, you can simulate these situations with the sample function. Pick five numbers
at random from the set 1:40.

b) Notice that the default behavior of sample is sampling without replacement. That is, the
samples will not contain the same number twice, and size obviously cannot be bigger than
the length of the vector to be sampled. If you want sampling with replacement, then you
need to add the argument replace=TRUE.

Sampling with replacement is suitable for modelling coin tosses or throws of a die. So,
for instance, simulate 10-coin tosses.

c) In fair coin-tossing, the probability of heads should equal the probability of tails, but the
idea of a random event is not restricted to symmetric cases. It could be equally well applied
to other cases, such as the successful outcome of a surgical procedure. Hopefully, there
would be a better than 50% chance of this. Simulate data with nonequal probabilities for the
outcomes (say, a 90% chance of success) by using the prob argument to sample.

d) The choose function can be used to calculate the following express.

e) Find 5!

3.a)
> x<-1:40

>x
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36 37
[38] 38 39 40
> sample(x)
[1] 18 30 10 36 2 15 23 35 5 29 16 38 17 24 7 21 31 14 26 6 11 3 33 37 28 22 32
1 19 12 8 27 39 13 40 9 25 38 4 34 20
> sample(x,5)
[1] 38 10 24 32 20
b)

> sample(x,5,replace=T
) [1] 20 14 26 26 29

>>
sample(x,5,replace=T)
[1] 35 32 22 1 6

> sample(x,5,replace=T
) [1] 20 14 26 26 29

> sample(x,5,replace=T
) [1] 7 7 15 3 16
> sample(x,5,replace=T
) [1] 22 14 38 38 39

> sample(x,5,replace=T
) [1] 38 28 3 19 36

> sample(x,5,replace=T
) [1] 14 14 40 27 24

> sample(x,5,replace=T
) [1] 30 9 33 14 27

> sample(x,5,replace=T
) [1] 3 12 27 22 27

> sample(x,5,replace=T
) [1] 32 39 18 1 13

> sample(x,5,replace=T
) [1] 38 6 1 16 4
c)
> sample(c("H","T"),10,replace=T)
[1] "H" "H" "H" "T" "H" "T" "H" "T" "H" "T"
> sample(c("H","T"),10,replace=T)
[1] "T" "T" "H" "T" "H" "H" "T" "T" "H" "T"
> sample(c("H","T"),10,replace=T)
[1] "T" "T" "H" "H" "T" "H" "T" "H" "H" "T"
> sample(c("H","T"),10,replace=T)
[1] "T" "T" "T" "H" "H" "H" "T" "H" "H" "H"
> sample(c("H","T"),10,replace=T)
[1] "H" "H" "T" "T" "H" "T" "H" "T" "H" "H"
> sample(c("H","T"),10,replace=T)
[1] "H" "T" "H" "T" "T" "T" "T" "T" "H" "H"
> sample(c("H","T"),10,replace=T)
[1] "H" "H" "T" "H" "H" "T" "T" "T" "H" "T"

> sample(c("H","T"),10,replace=T)
[1] "H" "T" "H" "H" "T" "H" "T" "H" "H" "H"
> sample(c("H","T"),10,replace=T)
[1] "H" "T" "T" "H" "H" "H" "H" "H" "T" "H"
> sample(c("H","T"),10,replace=T)
[1] "T" "T" "T" "H" "T" "H" "T" "H" "H" "H"
> sample(c("H","T"),10,replace=T)
[1] "T" "H" "T" "T" "T" "H" "T" "H" "T" "T"
>>sample(c("H","T"),10,replace=T,prob=c(90,10
))
[1] "H" "T" "H" "H" "H" "H" "H" "H" "H"
"H" d) choose(40,5)
[1] 658008
> factorial(40)/(factorial(5)*factorial(35))
[1] 658008

e) factorial(5
) [1] 120
BINOMIAL DISTRIBUTION

> dbinom(3,5,0.95)
[1] 0.02143438

> dbinom(2,5,0.95)+dbinom(3,5,0.95)+dbinom(4,5,0.95)
[1] 0.2261891
> dbinom(c(0,1,2,3,4,5),5,0.95)
[1] 0.0000003125 0.0000296875 0.0011281250 0.0214343750 0.2036265625
0.7737809375
> dbinom(0:4,5,0.95)

[1] 0.0000003125 0.0000296875 0.0011281250 0.0214343750 0.2036265625

> dbinom(2:4,5,0.95)

[1] 0.001128125 0.021434375 0.203626563

> sum(dbinom(2:4,5,0.95))
[1] 0.2261891
> > pbinom(3,5,0.95)
[1] 0.0225925
> pbinom(2:4,5,0.95)
[1] 0.001158125 0.022592500 0.226219063

> sum(pbinom(2:4,5,0.95))
[1] 0.2499697

>>
sum(pbinom(2:4,5,0.95)) [1]
0.2499697
> sum(dbinom(2:4,5,0.95))
[1] 0.2261891
> pbinom(2:4,5,0.95)

[1] 0.001158125 0.022592500 0.226219063

> dbinom(2:4,5,0.95)

[1] 0.001128125 0.021434375 0.203626563

> > pbinom(4,5,0.95)-pbinom(1,5,0.95)
[1] 0.2261891
> > plot(x,prob,type="h")
> > plot(x,prob,type="h",main='Binomial Distribution')
> > plot(x,prob,type="h",main='Binomial
Distribution',xlab='No.of.ready terminals(x)',ylab='p(x)')
PNORM, QNORM AND RNORM FUNCTIONS

IQ's are normally distributed with mean of 100 and S.D of 15

1)what percentage of people have I
[1] 0.9522096
Q less than 125
> pnorm(125,100,15,lower.tail=T)

2)what percentage of people have IQ greater than 110

> pnorm(110,100,15,lower.tail=F)
[1] 0.2524925

3)what percentage of people have IQ in between 110 and 125

> pnorm(125,100,15,lower.tail=T)-
pnorm(110,100,15,lower.tail=F) [1] 0.6997171

4)Find 25 percentage for standard normal distribution

> qnorm(0.25)
[1] -0.6744898
Find 25 percentage for standard normal distribution with mean of 2 and S.D of 3
> qnorm(.25,2,3)
[1] -0.02346925

5)what IQ seperates the lower 25% from the others (mean=100 and S.D=15)
> qnorm(0.25,100,50,T)
[1] 66.27551
6)what IQ seperates the top 10% from the others (mean=100 and S.D=15)
> qnorm(0.10,100,50,F)
[1] 164.0776

*Generating random numbers from a normal distribution with mean 572 and sd
51
> rnorm(n=20, mean=572, sd=51)

[1] 513.7724 651.8332 508.8003 561.2989 589.0541 510.5020 579.1214 595.0203

629.1108 543.1042 523.8742 620.4255 583.5188 461.0772 557.2757 477.0051
561.9530 498.8606 505.4049 556.4619
> b<-rnorm(n=20, mean=572, sd=51)
>b
[1] 547.4588 650.9067 626.0564 617.3190 527.6772 548.8507 570.9685
599.9386 581.5112 630.2935 593.9666 535.7267 529.0297 627.5354 632.6571
483.1399 577.9614 497.0527 624.7555 541.8844
> mean(b)
[1] 577.2345
> sd(b)
[1] 49.02958
HISTOGRAM
>hist(b,freq=F)
>curve(dnorm(x,572,51),add=T)
> hist(b,main="Normal Distribution",freq=F)
> curve(dnorm(x,mean(b),sd(b)),add=T)
TESTOF HYPOTHESIS: Z-TEST

Test the hypothesis that the mean systolic blood pressure in a certain
population
equals 140 mmHg. The standard deviation has a known value of 20 and a data
set
of 55 patients is available.
120,115,94,118,111,102,102,131,104,107,115,139,115,113,114,105,115,134,109,
109,93,118,109,106,125,150,142,119,127,141,149,144,142,149,161,143,140,148,
149,141,146,159,152,135,134,161,130,125,141,148,153,145,137,147,169
Sol:
x<-c(120,115,94,118,111,102,102,131,104,107,115,139,115,113,114,105,115,134
109,109,93,118,109,106,125,150,142,119,127,141,149,144,142,149,161,143,140,1
48,149,141,146,159,152,135,134,161,130,125,141,148,153,145,137,147,169)
>x
[1] 120 115 94 118 111 102 102 131 104 107 115 139 115 113 114 105 115 134
109 109 93 118 109 106 125 150 142 119 127 141 149 144 142 149 161 143 140
148 149 141 146 159 152 135 134 161 130 125 141 148 153 145 137 147 169
> length(x)
[1] 55

> mean(x)
[1] 130
> sd(x)

[1] 19.16691
> n<-
length(x) > n
[1] 55

> z<-sqrt(n)*(mean(x)-
140)/sd(x) > z
[1] -3.869272
> p<-
2*pnorm(abs(z)) > p
[1] 1.999891
> 1-

pnorm(z)
[1] 0.9999454

> pnorm(z)

[1] 5.458036e-05
> 2*(1-pnorm(z))
[1] 1.999891
> pnorm(z)

[1] 5.458036e-05
> pnorm(1-3.86)
[1] 0.002118205
> 2*(1-pnorm(3.86))
[1] 0.000113387
> prop.test(43,100,p=0.5,conf.level=0.95)

1-sample proportions test with continuity correction

data: 43 out of 100, null probability 0.5
X-squared = 1.69, df = 1, p-value = 0.1936
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:

0.3326536 0.5327873
sample estimates:
p
0.43
TEST OF HYPOTHESIS: T-TEST
1. An outbreak of salmonella-related illness was attributed to ice produced at a
certain factory. Scientists measured the level of Salmonella in 9 randomly
sampled batches ice cream .The levels (in MPN/g) were: 0.593 0.142 0.329
0.691,0.231 0.793 0.519 0.392 0.418 Is there evidence that the mean level pf
Salmonella in ice cream greater than 0.3 MPN/g?

Sol. Let mu be the mean of population

Null hypothesis is H0: mu=0.3
Alternative Hypothesis is H1:mu>0.3 [Right tail test]
Level Of Significance is 5%
Compute t value
CODE:

> x<-c(0.593,0.142,0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418)

[1] 0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418
> t.test(x,alternative="greater",mu=0.3)
One Sample t-test
data: x
t = 2.2051, df = 8, p-value = 0.02927
alternative hypothesis: true mean is greater than 0.3
95 percent confidence interval:
0.3245133 Inf
sample estimates:
mean of x
0.4564444
Conclusion:
t = 2.2051, p-value = 0.02927
We got | t |= 2.2051>p-value, we reject the null hypothesis.

We got the result that true mean is greater than 0.3. so we reject the null
hypothesis. We conclude that there evidence that the mean level Salmonella in
ice cream greater than 0.3 MPN

2. Suppose that 10 volunteers have taken an intelligence test; here are the
results obtained. The average score of the entire population is 75 in the same
test. Is there any significant difference (with a significance level of 95%)
between the sample and population means, assuming that the variance of the
population is not known.

Scores: 65, 78, 88, 55, 48, 95, 66, 57, 79, 81
Sol: Let mu be the mean of population
Null hypothesis is H0: mu=75
Alternative Hypothesis is H1:mu1=75 [Two tail test]
Level Of Significance is 5%
Compute t value
> x<-c(65, 78, 88, 55, 48, 95, 66, 57, 79, 81)
> t.test(x,mu=75)
One Sample t-test
data: x
t = -0.78303, df = 9, p-value = 0.4537
alternative hypothesis: true mean is not equal to 75
95 percent confidence interval:
60.22187 82.17813
sample estimates:
mean of x
71.2
Conclusion:
P-value =0.4537 and t=-0.78303
|t|=0.78303>P-value, we reject null hypothesis
Hence there is a significant difference between the sample and population means

3. Comparing two independent sample means, taken from two populations

with unknown variance. The following data shows the heights of individuals of
two different countries with unknown population variances. Is there any
significant difference b/n the average heights of two groups.

A: 175 168 168 190 156 181 182 175 174 179
B: 185 169 173 173 188 186 175 174 179 180

Sol: It is an independent two group test. We test the claim using two tail t test
Let mu1 and mu2 be the mean of groups A and B respectively.
Null hypothesis: H0:mu1-mu2 i.e. no significant difference
Alternative Hypothesis: H1:mu1=mu2 (there is significant difference)
LOS=5%

CODE-
> A<-c(175 ,168 ,168, 190, 156, 181, 182, 175, 174, 179)
> B<-c(185 ,169, 173, 173, 188, 186, 175, 174, 179, 180 )

> t.test(A,B)
Welch Two Sample t-test
Data: A and B
t = -0.94737, df = 15.981, p-value = 0.3576
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.008795 4.208795
sample estimates:
mean of x mean of y
174.8 178.2

We got p-value=0.3576
Conclusion:
t = -0.94737, p-value = 0.3576
| t |=0.94737>p-value, we reject null hypothesis

We got the result that true difference in means is not equal to 0 and therefore reject
the null hypothesis. So there is a significant difference between the average heights
of two groups
LINEAR REGRESSION AND CORRELATION
> x<-c(132,129,120,113.2,105,92,84,83.2,88.4,59,80,81.5,71,69.2)
> length(x)
[1] 14
> y<-c(46,48,51,52.1,54,52,59,58.7,61.6,64,61.4,54.6,58.8,58)
> length(y)
[1] 14
> lm(x~y)
Call:

lm(formula = x ~
y) Coefficients:
(Intercept)y
303.584 -3.777
> plot(x,y,col='red',pch=15)
> abline(lm(y~x))
> cars speed
dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
7 10 18
8 10 26
9 10 34
10 11 17
11 11 28
12 12 14
13 12 20
14 12 24
15 12 28
16 13 26
17 13 34
18 13 34
19 13 46
20 14 26
21 14 36
22 14 60
23 14 80
24 15 20
25 15 26
26 15 54
27 16 32
28 16 40
29 17 32
30 17 40
31 17 50
32 18 42
33 18 56
34 18 76
35 18 84
36 19 36
37 19 46
38 19 68
39 20 32
40 20 48
41 20 52
42 20 56
43 20 64
44 22 66
45 23 54
46 24 70
47 24 92
48 24 93
49 24 120
50 25 85
> scatter.smooth(x=cars$speed, y=cars$dist, main="Dist ~ Speed")
> x=cars$speed

>x
[1] 4 4 7 7 8 9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15 15 16
16 17 17 17 18 18 18 18 19 19 19 20
[40] 20 20 20 20 22 23 24 24 24 24 25

> y=cars$dist
>y
[1] 2 10 4 22 16 10 18 26 34 17 28 14 20 24 28 26 34 34 46 26 36 60
80 20 26 54 32 40 32
[30] 40 50 42 56 76 84 36 46 68 32 48 52 56 64 66 54 70 92 93 120 85
> plot(x,y,col='red',pch=3)
> abline(lm(y~x))
Correlation

> cor(cars$speed, cars$dist) [1]

0.8068949
> linearMod <- lm(dist ~ speed, data=cars)
> print(linearMod)
Call:

lm(formula = dist ~ speed, data = cars)

Coefficients:
(Intercept) speed
-17.579 3.932
> lm(x~y)
Call:

lm(formula = x ~ y)
Coefficients:
(Intercept)y
8.2839 0.1656

> lm(y~x)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept) x
-17.579 3.932
44 | P a g e

Read BMR, Stay Ahead! - Building Material Reporter Magazine
No ratings yet
Read BMR, Stay Ahead! - Building Material Reporter Magazine
82 pages
Nishant R File
No ratings yet
Nishant R File
49 pages
SML Practical 1to11
No ratings yet
SML Practical 1to11
23 pages
EDA With R Lab Manual
No ratings yet
EDA With R Lab Manual
110 pages
EET305 SS Lecture Notes Full
100% (1)
EET305 SS Lecture Notes Full
152 pages
Arunav Da Prac
No ratings yet
Arunav Da Prac
55 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
Box Culvert Design IRC - 1
No ratings yet
Box Culvert Design IRC - 1
9 pages
Mathallcodes 1
No ratings yet
Mathallcodes 1
32 pages
Fyybsc - CS Sem 1 FMS Journal
No ratings yet
Fyybsc - CS Sem 1 FMS Journal
43 pages
PC110R-1 M Weam000402 PC110R-1 PDF
No ratings yet
PC110R-1 M Weam000402 PC110R-1 PDF
212 pages
Parth Suryavanshi (231056) Practical No.1 To No.5
No ratings yet
Parth Suryavanshi (231056) Practical No.1 To No.5
37 pages
R Tutorial
No ratings yet
R Tutorial
32 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
R File Code
No ratings yet
R File Code
16 pages
Exercise 10
No ratings yet
Exercise 10
52 pages
R Examples
No ratings yet
R Examples
56 pages
Lab Book
No ratings yet
Lab Book
24 pages
Model 1
No ratings yet
Model 1
14 pages
Lab 02 - Compound Data Structures
No ratings yet
Lab 02 - Compound Data Structures
12 pages
Ernst and Young
No ratings yet
Ernst and Young
15 pages
18 3 24 Upto Week 6 A B Latest 1
No ratings yet
18 3 24 Upto Week 6 A B Latest 1
25 pages
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
0% (1)
R For Machine Learning Lab Practical Work: Master of Business Administration in Business Analytics
9 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
CH 3
No ratings yet
CH 3
33 pages
DS Lab
No ratings yet
DS Lab
31 pages
30 de Thi Vao Lop 10 Mon Tieng Anh Co Dap An (Repaired)
No ratings yet
30 de Thi Vao Lop 10 Mon Tieng Anh Co Dap An (Repaired)
158 pages
R Program3
No ratings yet
R Program3
21 pages
HDFC Core Banking
100% (1)
HDFC Core Banking
32 pages
R Program Record Book Iba
No ratings yet
R Program Record Book Iba
24 pages
Assignment12 L50-L53
No ratings yet
Assignment12 L50-L53
13 pages
R Lab
No ratings yet
R Lab
15 pages
R Practical
No ratings yet
R Practical
9 pages
R Program
No ratings yet
R Program
22 pages
Statistic and R Programming Lab Exercise
No ratings yet
Statistic and R Programming Lab Exercise
8 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
Experiment No 8
No ratings yet
Experiment No 8
11 pages
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
No ratings yet
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
15 pages
Practical 5 2
No ratings yet
Practical 5 2
7 pages
R Studio Notes
No ratings yet
R Studio Notes
10 pages
Converted R
No ratings yet
Converted R
8 pages
R Console
No ratings yet
R Console
6 pages
Sheet1 Sol
No ratings yet
Sheet1 Sol
10 pages
Applied Statistics MAT1011
No ratings yet
Applied Statistics MAT1011
22 pages
R Assignment
No ratings yet
R Assignment
9 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
DAV Practicle File
No ratings yet
DAV Practicle File
28 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
Katalog Magnet Trap
No ratings yet
Katalog Magnet Trap
15 pages
Menahga Strategic Plan Progress
100% (1)
Menahga Strategic Plan Progress
11 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
6 rDNA Technology
No ratings yet
6 rDNA Technology
58 pages
Grouping, Loops and Conditional Execution
No ratings yet
Grouping, Loops and Conditional Execution
13 pages
R Code
No ratings yet
R Code
9 pages
Practical 2 Kunal
No ratings yet
Practical 2 Kunal
6 pages
Material Handling (Conveyors, Cranes and Lifts)
No ratings yet
Material Handling (Conveyors, Cranes and Lifts)
34 pages
Part A R Programming
No ratings yet
Part A R Programming
10 pages
Chapter 2 - Second-Generation Sequencing For Cancer Genome Analysis
No ratings yet
Chapter 2 - Second-Generation Sequencing For Cancer Genome Analysis
18 pages
R Workshop Material 18-19, Oct-2023
No ratings yet
R Workshop Material 18-19, Oct-2023
67 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
AP Sbtet C-16 Time Table-Mar-2019
No ratings yet
AP Sbtet C-16 Time Table-Mar-2019
25 pages
Chapter - 03 - Review of Basic Data
No ratings yet
Chapter - 03 - Review of Basic Data
92 pages
QSG 108
No ratings yet
QSG 108
20 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
R
No ratings yet
R
13 pages
R Study Material I
No ratings yet
R Study Material I
8 pages
D2996 (1) Filament Wound FRP Pipe PDF
No ratings yet
D2996 (1) Filament Wound FRP Pipe PDF
6 pages
Awini Mustapha-Project1
No ratings yet
Awini Mustapha-Project1
8 pages
PSC Content
No ratings yet
PSC Content
16 pages
Practical 10
No ratings yet
Practical 10
22 pages
R Commands
No ratings yet
R Commands
18 pages
Summary Google Workspace vs. Office 365
No ratings yet
Summary Google Workspace vs. Office 365
2 pages
Acc GR 12 Week 5 PCS ENG
No ratings yet
Acc GR 12 Week 5 PCS ENG
6 pages
Analysis Report
No ratings yet
Analysis Report
8 pages
Lesson Selfies 20221031
No ratings yet
Lesson Selfies 20221031
5 pages
Open Elective List EVEN SEM
No ratings yet
Open Elective List EVEN SEM
2 pages
Computer Class 1
No ratings yet
Computer Class 1
3 pages
Reup 11 13
No ratings yet
Reup 11 13
3 pages
DLL 7
No ratings yet
DLL 7
2 pages
Pvd3D2R: Benefits and Features
No ratings yet
Pvd3D2R: Benefits and Features
5 pages
Kat (Pat) - Viii Calss - WS - 1
No ratings yet
Kat (Pat) - Viii Calss - WS - 1
3 pages
Tugas 1 Bahasa Inggris Niaga
No ratings yet
Tugas 1 Bahasa Inggris Niaga
2 pages
Distance
No ratings yet
Distance
2 pages
Slack Byte and Structure Padding in Structures
No ratings yet
Slack Byte and Structure Padding in Structures
3 pages
Spiced Chickpea Stew With Coconut and Turmeric Recipe - NYT Cooking
No ratings yet
Spiced Chickpea Stew With Coconut and Turmeric Recipe - NYT Cooking
2 pages
That Guy (그남자) - Hyun Bin Korean Version: (geu-nam-ja)
No ratings yet
That Guy (그남자) - Hyun Bin Korean Version: (geu-nam-ja)
2 pages
Mindful Maths 1: Use Your Algebra to Solve These Puzzling Pictures
From Everand
Mindful Maths 1: Use Your Algebra to Solve These Puzzling Pictures
Ann McNair
No ratings yet
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Sat Mathematics Review And Practice
From Everand
Sat Mathematics Review And Practice
Addison Shaw
1/5 (1)

Stastistics and Probability With R Programming Language: Lab Report

Uploaded by

Stastistics and Probability With R Programming Language: Lab Report

Uploaded by

STASTISTICS AND PROBABILITY WITH R

Under the guidance of : Prof Dr. SANTANU MANDAL

Introduction to R programming and Simple Operations…….. 3

2. Enter the data {1, 2, …. ,19,20} in a variable x

c) > x*2 or x^2

5. Few simple statistical measures:

(f) Check whetheris less than or equal to standard deviation

1. Few simple statistical

f) Check whether is less than or equal to standard deviation. g)

1. Augmenting the file and saving the resultant file:

1 52.00 1225 3 6.2 no 0.04244898

19 88.00 2100 8 2.3 yes 0.04190476

d) The choose function can be used to calculate the following express.

[1] 0.0000003125 0.0000296875 0.0011281250 0.0214343750 0.2036265625

[1] 0.001128125 0.021434375 0.203626563

[1] 0.001158125 0.022592500 0.226219063

[1] 0.001128125 0.021434375 0.203626563

IQ's are normally distributed with mean of 100 and S.D of 15

2)what percentage of people have IQ greater than 110

3)what percentage of people have IQ in between 110 and 125

4)Find 25 percentage for standard normal distribution

[1] 513.7724 651.8332 508.8003 561.2989 589.0541 510.5020 579.1214 595.0203

1-sample proportions test with continuity correction

Sol. Let mu be the mean of population

> x<-c(0.593,0.142,0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418)

3. Comparing two independent sample means, taken from two populations

> cor(cars$speed, cars$dist) [1]

lm(formula = dist ~ speed, data = cars)

You might also like