t-Test for single mean raw data
>dataname=c(enter given values)
>t.test(dataname,mu=value of population mean µ)
[Note: If we want to mention the type of the Tail, use>t.test(x,mu=value of
population mean µ,alternative=”greater” or ”two.sided” or “less”)]. If not
mentioned it will take two sided by default.
Problem1:The height of 10 males of a given locality is found to be
70,67,62,68,61,68,70,64,64,66 inches. Is it reasonable to believe that the average
height is 64 inches .Test at 5% level of significance.
Input:
> x=c(70,67,62,68,61,68,70,64,64,66)
> t.test(x,mu=10)
One Sample t-test
data: x
t = 2, df = 9, p-value = 0.07655
alternative hypothesis: true mean is not equal to 64
95 percent confidence interval:
63.73784 68.26216
sample estimates:
mean of x
66
Inference: p-value=0.07655>0.05, hence accept Ho at 5% los, true mean is equal to
64.Since in 95 % confidence interval (63.73784 68.26216) mu value lies we accept
Ho i.e mean value is equal to 64.
t-Test for difference of means raw data
>dataname1=c(Enter given X values)
>dataname2=(Enter given Y values)
>t.test(dataname1,dataname2,var.equal=TRUE)
[Note: If we want to mention the type of the Tail, use
>t.test(dataname1,dataname2,var.equal=TRUE,alternative=”greater”or ”two sided”
or “less”)] If not mentioned it will take two sided by default.
Problem: The following data represents the biological values of proteins from cow
’s milk and buffalo milk at a certain level. Examine the average value of protein in
two samples is significantly different.
Cow’s 1.82 2.02 1.88 1.61 1.81 1.54
milk
Buffalo 2.00 1.83 1.86 2.03 2.19 1.88
‘s milk
Input:
>Cowmilk=c(1.82,2.02,1.88,1.61,1.81,1.54)
>Buffalomilk=c(2.00,1.83,1.86,2.03,2.19,1.88)
> t.test(Cowmilk, Buffalomilk,var.equal=TRUE,alternative="two.sided")
Output:
Two Sample t-test
data: Cow milk and Buffalo milk
t = -2.03, df = 10, p-value = 0.0698
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.38805424 0.01805424
sample estimates:
mean of x mean of y
1.780 1.965
Inference: p=0.0698>0.05, hence accept H0 at 5% los, true difference in means is
equal to 0.Since in 99 % confidence interval (38805424 , 0.01805424) includes
value 0 we accept Ho i.e true difference in mean value is equal to 0.
Paired t-test
>dataname1=(enter given x values)
>dataname2=(enter given y values)
>t.test(dataname1,dataname2,paired=TRUE)
[Note: If we want to mention the type of the Tail, use
>t.test(dataname1,dataname2,paired=TRUE, alternative=”greater” or ” two sided”
or “less”)]
If not mentioned it will take two sided by default.
Problem: The following data relate to the marks obtained by 11 students before
and after intensive coaching .Do the data indicate that the students have benefitted
by the coaching.
Marks 19 23 16 24 17 18 20 18 21 19 20
before
Coaching
Marks 17 24 20 24 20 22 20 20 18 22 19
after
7coaching
Input:
> x=c(19,23,16,24,17,18,20,18,21,19,20)
> y=c(17,24,20,24,20,22,20,20,18,22,19)
> t.test (x, y, paired=TRUE)
Output
Paired t-test
data: x and y
t = -1.3772, df = 10, p-value = 0.1985
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.6179307 0.6179307
sample estimates:
mean of the differences -1
Inference: p=0.1985>0.05, hence accept H0 at 5% los, true difference in means is
equal to 0.Since 95 % confidence interval (--2.6179307 0.6179307) include value
0 we accept Ho i.e true difference in mean value is equal to 0.
t-test for Sample Correlation Coefficient ‘r’
>dataname1=c(x1,x2,x3,…….)
>dataname2=c(y1,y2,y3,……..)
>cor.test(dataname1,dataname2)
[Note: If we want to mention the type of the Tail, use
>cor.test(dataname1,dataname2, alternative=”greater”or ”two.sided” or “less”)]
If not mentioned it will take two sided by default.
Problem: Test for significance of correlation for random variables X & Y
X: 12 15 17 19 10 8 6
Y: 25 14 16 13 5 11 9
Input:
> x=c(12,15,17,19,10,8,6)
> y=c(25,14,16,13,5,11,9)
> cor.test(x,y)
Output:
Pearson's product-moment correlation
data: x and y
t = 0.86354, df = 5, p-value = 0.4273
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.5390406 0.8757331
sample estimates:
cor
0.3602557
Inference: p=0.4273>0.05, hence accept H0 at 5% los, true difference in means is
equal to 0. Since 95 % confidence interval (-0.5390406 , 0.8757331) include value
0 we accept Ho i.e true difference in mean value is equal to 0
Small Sample Tests for difference of variances
F-test
>dataname1=c(x1,x2,x3……..)
>dataname2=c(y1,y2,y3………)
>var.test(dataname1,dataname2,ratio=1,alternative=”greater” or “two.sided” or
“less”)
Problem: Two random samples were drawn from two normal populations and the
following results were obtained:
Sample 16 17 18 19 20 21 22 24 26 27
I
Sample 19 22 23 25 26 28 29 30 31 32 35 36
II
Test the equality of population variances.
Input:
>x=c(16,17,18,19,20,21,22,24,26,27)
>y=c(19,22,23,25,26,28,29,30,31,32,35,36)
> var.test(x,y,ratio=1,alternative="less")
Output:
F test to compare two variances
data: x and y
F = 0.51678, num df = 9, denom df = 11, p-value = 0.331
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.1440338 2.0216761
sample estimates:
ratio of variances
0.5167785
[Note: In F-test, use only p-value for the inference]
Inference: p=0.331>0.05, hence accept H0 at 5% los, ⸫ population variances are
equal.
Chi –Square Test for Independence of Attributes
chi=data.frame(b1=c(x1,x2,x3),b2=c(y1,y2,y3))
chisq.test(Chi, correct=FALSE)
Problem: A random sample of students of a university was selected and asked
their opinions about terrorist .The results are given below
Favored Opposed
B.A 120 80
B.com 130 70
B.Sc 70 30
P.G 30 20
Test the hypothesis of opinions are independent of the classes.
Input:
> Chi=data.frame(b1=c(120,130,70,80),b2=c(80,70,30,20))
> chisq.test(Chi,correct=FALSE)
Output:
Pearson's Chi-squared test
data: Chis
X-squared = 12.75, df = 3, p-value = 0.00521
Inference: p-value = 0.00521<0.05 so we reject H 0 at 5% level of significance.
Opinions are not independent of the classes.
Chi –Square Test for Goodness of Fit
>x=c(a,b,c)
>chisq.test(x, p=c(probability value) )
Problem: The following table gives the no. of aircraft accidents that occurs during
the various days of the week. Find whether the accidents are uniformly distributed
over the week.
Days Sun Mon Tue Wed Thu Fri Sat
No. of 14 16 8 12 11 9 14
accidents
Input:
>x=c(14,16,8,12,11,9,14)
>chisq.test(x,p=c(1/7,1/7,1/7,1/7,1/7,1/7,1/7))
Output
Chi-squared test for given probabilities
data: x
X-squared = 4.1667, df = 6, p-value = 0.6541
Inference: p-value = 0.6541>0.05 so we accept H0 at 5% level of significance.
Aircrafts are uniformly distributed over the week.
Large Sample Test for Single Mean
>z.test = function (a, mu, var){
+ zeta = (mean(a) - mu) / (sqrt(var / length(a)))
+ return(zeta)}
> z.test(a, mu, sd)
Problem: The following is random samples of heights (in cms) of 32 students. Test
whether the sample as come from a population with average of height of 150cms.
141,143,148,152,155,154,150,160,161,144,147,152,150,151,149,130,142,141,162,
163,150,154,1 48,146,147,148,147,150,151,152.
Input:
> x=
c(141,143,148,152,155,154,150,160,161,144,147,152,150,151,149,130,142,141,
162,163,150,154,148,146,147,148,147,150,151,152)
>var(x)
[1] 47.14483
Length(x)
[1] 30
Mu=150
>z.test = function (x, mu, var){
+ zeta = (mean(x) - mu) / (sqrt(var / length(x)))
+ return(zeta)}
> z.test(x, mu, var)
Output
[1] -0.3190829
Inference: At 5% of los critical value =1.96 .since Zeta =-0.3190829 < 1.96 we
accept H0 .Sample has come from a population with average of height of 150cms.
Note : If variance is not given in the question find it by using command
var(x)
Large Sample Test for Difference Mean
z.test2sam = function(a, b, var.a, var.b){
+ n.a = length(a)
+ n.b = length(b)
+ zeta = (mean(a) - mean(b)) / (sqrt(var.a/n.a + var.b/n.b))
+ return(zeta)
+}
Problem: The following two independent samples are drawn from two
populations the variance of the population A is 5 & variance of the
population B is 8.5 Test whether the population means are equal at 5% l.o.s.
A: 175, 168, 168, 190, 156, 181, 182, 175, 174, 179
B: 185, 169, 173, 173, 188, 186, 175, 174, 179, 180
Input:
> a = c(175, 168, 168, 190, 156, 181, 182, 175, 174, 179)
> b = c(185, 169, 173, 173, 188, 186, 175, 174, 179, 180)
mean(a)
mean(b)
>z.test2sam = function(a, b, var.a, var.b){
+ n.a = length(a)
+ n.b = length(b)
+ zeta = (mean(a) - mean(b)) / (sqrt(var.a/n.a + var.b/n.b))
+ return(zeta)
+}
> z.test2sam(a, b, 5, 8.5)
Output:
[1] -2.926254
Inference: At 5% of los critical value =1.96 .since Zeta =-2.926254< 1.96 we
accept H0 . That the population means are equal.
Large sample test for Single Proportions
> x=x1
> n=y
> p=z
>prop.test(x,n,p)
Problem:In a sample of maharashtra ,540 are rice eaters and the rest are wheat
eaters .Can we assume both rice and wheat eaters are equally popular in
maharashtra at 5% level of significance.
Input:
> x=540
> n=1000
> p=0.5
>prop.test(x,n,p)
1-sample proportions test with continuity correction
data: x out of n, null probability p
X-squared = 6.241, df = 1, p-value = 0.01248
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5085147 0.5711742
sample estimates:
p =0.54
Inference: p=0.01248<0.05, hence reject H0 at 5% los, rice eaters and wheat
eaters are not equally popular in Maharashtra. Since 95 % confidence interval (--
0.5085147, 0.5711742) does not include value 0 we reject Ho i.e true difference
in mean value is not equal to 0.
Large sample test for Difference of Proportions
>x=c(x1,x2)
n=c( n1,n2 )
prop.test(x,n)
Problem: In a random sample of 500 men from a particular district of U.P.,300 are
found to be smokers. In one of 1,000men from another district , 550 are smokers.
Do the data indicate that two districts are significantly different with respect to the
prevalence of smoking among men.
> x=c(300,550)
>n=c(500,1000)
> prop.test(x,n)
2-sample test for equality of proportions with continuity correction
data: x out of n
X-squared = 3.1931, df = 1, p-value = 0.07395
alternative hypothesis: two.sided
95 percent confidence interval:
-0.004364556 0.104364556
sample estimates:
prop 1 prop 2
0.60 0.55
Inference: p=0.07395>0.05, hence accept H0 at 5% los, two districts are not
significantly different with respect to the prevalence of smoking among men. Since
95 % confidence interval (-0.004364556, 0.104364556)) include value 0 we
accept Ho i.e true difference in mean value is equal to 0.
Sign Test for one sample
>X=c(a,b,c,d)
>sp=length(X[X>mo])
>sn=length(X[X<mo])
> n=sp+sn
> pv=pbinom(sp,n,0.05)
Problem: A test is conducted for 20 students in a school and marks are given
below
93,88,107,115,82,97,103,86,113,107,112,90,98,93,99,103,100,101,96,104.The
statistical hypothesis of the median marks of the students in a school is 99 marks.
Input:
>S=c(93,88,107,115,82,97,103,86,113,107,112,90,98,93,99,103,100,101,96,104)
> mo=99
>sp=length(S[S>mo])
> sn=length(S[S<mo])
> n=sp+sn
> pv=pbinom(sp,n,0.05)
Output:
> sp
[1] 10
> pv
>1
Inference: pv=1>0.05 level of significance hence we accept H0 .Median marks of
the students in a school is 99 marks.
Sign Test for Paired Samples
x=c(y1,y2 ,y3)
> y=c(z1,z2,z3)
> d=x-y
> sp=length(d[d>0])
> sn=length(d[d<0])
> n=sp+sn
> pv=pbinom(sp,n,0.05)
> sp
> pv
Problem: 17 students are selected for a special training and test is conducted to
them. Another test is conducted after completion of the training. The marks are
given below in grades in the following table .Test the difference in their
talent in two tests.
Grades in test 1: 2, 3, 3, 3, 3, 3, 3, 3 ,2 ,3 ,2 ,2, 5, 2, 5, 3 ,1
Grades in test 2: 4, 4 ,5 ,5 ,3 ,2, 5 ,3 ,1 , 5 ,5,5 ,4 ,5 ,5, 5, 5
Input:
>x=c(2, 3, 3, 3, 3, 3, 3, 3 ,2 ,3 ,2 ,2, 5, 2, 5, 3 ,1)
> y=c(4, 4 ,5 ,5 ,3 ,2, 5 ,3 ,1 , 5 ,5,5 ,4 ,5 ,5, 5, 5)
> d=x-y
> sp=length(d[d>0])
> sn=length(d[d<0])
> n=sp+sn
> pv=pbinom(sp,n,0.05)
> sp
>3
> pv
Output
[1] 0.9958268
Inference: pv=0.9958268>0.05 level of significance hence we accept H0 .There is
no difference in their talent in two tests.
Wilcoxon Signed Rank Test for One Sample
>Z=c(a,b,c)
>wilcox.test(Z,alter="greater"or“lesser”or “Two sided”, mu)
Problem: The following are the measurements of the breaking strength X of a
certain kind of 2 inch cotton ribbon. Test the null hypothesis that the population
median of X is 160.
163,165,160,189,161,171,158,151,169,162,163,139,172,165,148,166,172,163,187,
173
Input:
>Z=c(163,165,160,189,161,171,158,151,169,162,163,139,172,165,148,166,172,16
3,187,173)
>wilcox.test(Z,alter="greater",mu=160)
Wilcoxon signed rank test with continuity correction
data: Z
V = 146, p-value = 0.02095
alternative hypothesis: true location is greater than 160
Warning messages:
1: In wilcox.test.default(Z, alter = "greater", mu = 160) :
cannot compute exact p-value with ties
2: In wilcox.test.default(Z, alter = "greater", mu = 160) :
cannot compute exact p-value with zeroes
Inference: pv=0.02095<0.05 level of significance hence we rejectt H0 . The
population median of X is not 160.
Wilcoxon Signed Rank Test for Paired Data
>x=c(a,b,c,d)
>y=c(e,f,g,h)
> d=x-y
>wilcox.test(d,mu=0)
Problem: A sample of 12 pairs of twins are taken at random and their intelligence
scores are given below. Test the intelligence level of twins is same or not.
x:86,79,77,68,91,72,77,91,70,71,88,87
Y:88,77,76,64,96,72,65,0,65,80,81,82
Input:
>x=c(86,79,77,68,91,72,77,91,70,71,88,87)
>y=c(88,77,76,64,96,72,65,0,65,80,81,82)
> d=x-y
>wilcox.test(d,mu=0)
Output
Wilcoxon signed rank test with continuity correction
data: d
V = 48.5, p-value = 0.1812
alternative hypothesis: true location is not equal to 0
Warning messages:
1: In wilcox.test.default(d, mu = 0) :
cannot compute exact p-value with ties
2: In wilcox.test.default(d, mu = 0) :
cannot compute exact p-value with zeroes
Inference: p=0.1812>0.05 level of significance hence we accept H0 .The
intelligence level of twins is same.
Run Test
> install.packages('snpar')
> library('snpar')
>y = c(a,b,c)
> runs.test(y, exact = TRUE)
Problem: Test the randomness to the following data.
109,124,173,167,148,132,168,165,118,112,114,164,180,123,180,152
Input
X=c(109,124,173,167,148,132,168,165,118,112,114,164,180,123,180,152)
> runs.test(X)
Output
Approximate runs rest
data: X
Runs = 8, p-value = 0.6048
alternative hypothesis: two.sided
Inference: p=0.6048>0.05 level of significance hence we accept H0 .The
given sample has been drawn randomly from population.