Statistics Lab
Statistics Lab
1
8. Find descriptive statistics, frequency table, 2081-12-25
box graph, pie-chart.
2
SPSS Software
Introduction
SPSS, which stands for "Statistical Package for the Social Sciences," is a software program
used for statistical analysis and data management. It was originally developed by IBM
(International Business Machines Corporation) and is widely used in various fields, including
social sciences, business, healthcare, and academic research.
SPSS provides a user-friendly interface that allows researchers and analysts to perform a wide
range of statistical analyses, data manipulation, and reporting without the need for advanced
programming skills. Some of the key features and capabilities of SPSS include:
Data Entry and Management: SPSS allows users to input, edit, and manage data
efficiently. It supports various data types, including numerical, categorical, and text
data.
Data Analysis: SPSS offers a comprehensive set of statistical tools for descriptive
statistics, hypothesis testing, regression analysis, correlation analysis, factor analysis,
and more.
Data Visualization: The software provides various data visualization options, including
charts, graphs, and plots, to help users understand and present their data effectively.
Reporting: SPSS allows users to generate customized reports and tables that summarize
the results of their analyses. These reports can be easily exported for further use or
publication.
Syntax Language: Advanced users can take advantage of SPSS syntax language to
automate and replicate analyses, making it a powerful tool for reproducible research.
Integration: SPSS can integrate with other software and data sources, enabling users to
import and export data from different formats and platforms.
Advanced Analytics: In addition to basic statistical analyses, SPSS offers advanced
analytical capabilities, including predictive modeling and machine learning techniques.
SPSS is commonly used in academic research, market research, survey analysis, healthcare
research, and various other fields where data analysis and statistical interpretation are essential.
While it has been widely adopted for its user-friendly interface, it also provides more advanced
features for experienced statisticians and analysts.
3
Questions 1.
Find confidence interval of mean assuming normal distribution for following data height.
78 55 68 48 65 76 57 55 65 75 51 61 68 67 76 78 71 56 57 67 58 51 50 58 50
77 55 48 70 55 58 70 56 52 74 61 69 76 61 68 78 56 78 57 66 66 74 66 48 73
71 70 62 74 76 50 69 75 65 48
Solution:
Working Expression:
The confidence interval (CI) is a range of values that’s likely to include a population value with
a certain degree of confidence. It is often expressed as a % whereby a population mean lies
between an upper and lower interval.
The confidence interval of mean assuming normal distribution of given data is as follow:
Question 2.
A developer of food for pig wish to determine what relationship exists among age of a pig
when it starts receiving a newly developed food supplement, the initial weight of the pig and
the amount of weight it gain in a week period with the food supplement. The following
information is the result of study of eight piglets.
4
Piglet initial weight (x1) initial age (x2) weight gain (Y)
1 39 8 7
2 52 6 6
3 49 7 8
4 46 12 10
5 61 9 9
6 35 6 5
7 25 7 3
8 55 4 4
a) Determine the least square equation that best describe the variable.
c) Test the significance of regression coefficient and overall fit of the regression equation
Solution:
Working Expression:
The least square equation determination is the determination of regression equation. Regression
is the method which measures the average relationship between two or more variables in terms
of the original units of the data set.
The above given question uses multiple regression technique. Multiple regression was created
for cases in which there are three or more variables. Here in the given question we have one
dependent variable and other two are independent variables.
5
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Y= a+b1X1+b2X2
Y=-4.912+0.105X1+0.807X2
From the above table the regression equation weight gain on initial weight is:
Y= 0.105X1- 4.192
Y= 0.807X2- 4.192
Model Summaryb
1− 𝑟 2
Standard Error (S.E) =
√𝑛
1−0.881
S.E=
√8
= 0.042
6
ANOVAa
Total 42.000 7
Interpretation:
Significance:
0.005<0.05
Since the p-value (0.005) is lesser than a conventional significance level like 0.05, there is not
sufficient evidence to reject the null hypothesis. In other words, there is no significance
difference between variables
Residuals Statisticsa
ANOVAa
Total 42.000 7
Interpretation:
Residual:= (𝑦 − y^)2 = SSE
7
Interpretation:
In this plot, three data are in the marginal line while other data are deviated from the average
indicating outlier. If the greater number of data fit in the average point, then it indicates the
accuracy. Change in height and weight affects the deviation. There is small number of deviation
in the plot.
8
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta
1 (Constant) -4.192 1.888 -2.220 .077
initial weight(x1) .105 .032 .501 3.247 .023
initial age(x2) .807 .158 .786 5.097 .004
a. Dependent Variable: weight gain(y)
Y= a+b1X1+b2X2
Y=-4.912+0.105X1+0.807X2
Y=4.912+0.105*35+0.807*35
=36.832
Y=36.832
Question3.
Sales 43 41 36 34 50
expenses 10 22 13 19 17
Also calculate probable error of the data and interpret the result.
Solution:
Working Expression:
9
Pearson’s r varies between +1 and -1, where
Correlation:
sales expenses
sales Pearson
1 -.073
Correlation
Sig. (2-tailed) .907
N 5 5
expenses Pearson
-.073 1
Correlation
Sig. (2-tailed) .907
N 5 5
Interpretation:
Correlation = -.073;
10
Significance:
0.907>0.05
Since the p-value (0.907) is much greater than a conventional significance level like 0.05, we
typically fail to reject the null hypothesis. In other words, there is significance difference
between sales and expenses.
1− 𝑟 2
Probable error(P.E)= 0.675 *
√𝑛
P.E= 0.3
|r|=0.073
Here, |r|<P.E(r)
Since probable error of correlation coefficient is greater than absolute value of correlation
coefficient. Hence the correlation coefficient of sales and expenses is insignificant.
Question 4.
A computer operator is interested to know how data rate of internet users depend the band
width, the following result were gathered by the operator.
Band 17 35 41 19 25 20 10 15
Width
Data 47 64 68 50 60 55 30 33
rate
Solution:
Working Expressions:
11
linearly related variables The coefficient of correlation is expressed by “r”.
Regression:
Correlations
band
width data rate
band width Pearson
1 .902**
Correlation
Sig. (2-tailed) .002
N 8 8
data rate Pearson
.902** 1
Correlation
Sig. (2-tailed) .002
N 8 8
12
**. Correlation is significant at the 0.01 level (2-tailed).
Interpretation:
Finding association between bandwidth and data rate:
Correlation = 0.0902;
There is high degree of positive correlation coefficient of band width and data rate.
Significance:
0.002<0.05
Since the p-value (0.002) is lesser than a conventional significance level like 0.05, there is not
sufficient evidence to reject the null hypothesis. In other words, there is no significance
difference between band width and data rate.
Model Summaryb
Hence only 81.4% of the variation of data rate has been explained by variation of band width
13
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta T Sig.
1 (Constant) 23.749 5.761 4.123 .006
band width 1.192 .233 .902 5.126 .002
a. Dependent Variable: data rate
Interpretation:
The regression equation Y on X is given by
Y= a+ bX
Y= 23.749 + 1.192 X
Y= 23.749 + 1.192 X
When X=30,
Y= 23.749 + 1.192 * 30
= 59.509
14
Composition of data rate
Question 5
The yield of treatment in different plots is as shown in the following plots. Carry out analysis
Solution:
Working Expression:
The problem can be solved by using one way analysis of variance. Let’s consider the null and
alternative hypothesis:
H1 : µ1≠ µ2(Not all means are same at least one mean is different)
Formula:
15
Error Sum of square(SSE) = TSS- SSB
ANOVA
values
Sum of
Squares df Mean Square F Sig.
Between
4265689.961 3 1421896.654 11.253 .001
Groups
Within Groups 1768941.150 14 126352.939
Total 6034631.111 17
Interpretation:
P-value= 0.001
So, P-value< α%
Since P-value is lesser than the level of significance , we typically fail to reject the null
hypothesis. In other words, there is significance difference between treatments.
Question 6
The following table gives of the result of the experiment on four varieties of a crop in 5 block
of polt.
A 32 34 33 35 37
B 34 33 36 37 35
C 31 34 35 32 36
D 29 26 30 28 29
Solution:
Working Expression:
The two-way ANOVA compares the mean differences between groups that have been split on
two independent variables (called factors). The primary purpose of a two-way ANOVA is to
16
understand if there is an interaction between the two independent variables on the dependent
variable.
H1 : µ1≠ µ2 =µ3≠ µ4 (Not all means are same at least one mean is different)
H1 : µ1≠ µ2 =µ3≠ µ4 ≠ µ5(Not all means are same at least one mean is different)
Formula:
17
Interpretation:
For treatment:
P-value= 0.00
So, P-value< α%
Since P-value is lesser than the level of significance , we typically fail to reject the null
hypothesis. In other words, there is significance difference between treatments.
For Block:
P-value= 0.00
So, P-value< α%
Since P-value is lesser than the level of significance , we typically fail to reject the null
hypothesis. In other words, there is significance difference between blocks.
Question 7
There are three brand of computer namely A , B, C and their life time is tabulated as follows.
Test whether the average lifetime of three brands of computer is significantly different at 5%
level of significant.
A 5 7 3 2 6 4 8 9
B 2 3 4 5 6
C 7 8 9 10 12 13 11
Solution:
Working Expression:
The problem can be solved by using one way analysis of variance. Let’s consider the null and
alternative hypothesis:
H1 : µ1≠ µ2(Not all means are same at least one mean is different)
18
Formula:
ANOVA
values of brand
Sum of
Squares df Mean Square F Sig.
Between
124.200 2 62.100 13.196 .000
Groups
Within Groups 80.000 17 4.706
Total 204.200 19
Interpretation:
P-value= 0.00
So, P-value< α%
Since P-value is lesser than the level of significance , we typically fail to reject the null
hypothesis. In other words, there is significance difference between brands.
Question 8
1 Ram 25 65 1 2
2 Shyam 22 55 1 3
3 Sita 23 43 2 3
4 Hari 21 45 1 1
19
5 Rita 24 40 2 2
6 Roshan 21 43 1 3
7 Sabita 21 48 2 3
8 Diya 25 52 2 2
9 depak 19 41 1 1
Solution:
Working expression:
Mean:
Median:
Formula: Median = Middle value (for an odd number of values) or (Sum of two middle values)
/ 2 (for an even number of values)
Mode:
Maximum:
Minimum:
Standard Deviation:
Percentile:
Formula: Percentile Rank = (Number of values below the data point) / (Total number of values)
* 10
20
Statistics
weight of the age of the
student student
N Valid 9 9
Missing 1 1
Mean 48.00 22.33
Std. Error of Mean 2.703 .687
Median 45.00 22.00
Mode 43 21
Std. Deviation 8.109 2.062
Variance 65.750 4.250
Skewness 1.262 -.024
Std. Error of Skewness .717 .717
Kurtosis 1.253 -.983
Std. Error of Kurtosis 1.400 1.400
Range 25 6
Minimum 40 19
Maximum 65 25
Sum 432 201
Percentiles .35 40.00 19.00
.65 40.00 19.00
25 42.00 21.00
50 45.00 22.00
75 53.50 24.50
21
age of the student
Frequenc Valid Cumulative
y Percent Percent Percent
Valid 19 1 10.0 11.1 11.1
21 3 30.0 33.3 44.4
22 1 10.0 11.1 55.6
23 1 10.0 11.1 66.7
24 1 10.0 11.1 77.8
25 2 20.0 22.2 100.0
Total 9 90.0 100.0
Missing System 1 10.0
Total 10 100.0
22
The below shown image is the pie chart which shows the different education level:
23
Question 9
Type C1 C2 C3
83 56 79
83 76 95
76 72 87
Test whether the average cost per computer is significantly different among three type of the
computer at 5% level of significance.
Solution:
Working Expression:
The problem can be solved by using one way analysis of variance. Let’s consider the null and
alternative hypothesis:
H1 : µ1≠ µ2(Not all means are same at least one mean is different)
Formula:
ANOVA
values
Sum of
Squares df Mean Square F Sig.
Between
561.556 2 280.778 4.380 .067
Groups
Within Groups 384.667 6 64.111
Total 946.222 8
Interpretation:
24
Level of significance(α%) = 5% =0.05
P-value= 0.067
So, P-value>α%
Since P-value is greater than the level of significance , this indicates that there is not significant
evidence to reject the null hypothesis. This suggests that there is significant difference between
cost per computer.
Question 10
Cold 45 43 55
Warm 37 40 56
Hot 42 44 46
The following table gives the data on the perform of three different detergents at three different
water temperature. The performance was obtained on the whiteness reading based on specially
designed equipments for nine loads of washing.
Perform a two way analysis of variance using the level of significance at0.05
Solution:
Working Expression:
The two-way ANOVA compares the mean differences between groups that have been split on
two independent variables (called factors). The primary purpose of a two-way ANOVA is to
understand if there is an interaction between the two independent variables on the dependent
variable.
H1 : µ1≠ µ2 =µ3≠ µ4 (Not all means are same at least one mean is different)
H1 : µ1≠ µ2 =µ3≠ µ4 ≠ µ5(Not all means are same at least one mean is different)
Formula:
25
Grand Total(T)= ∑C1+ ∑C2+ ∑C3+ ∑C4+ ∑C5
Interpretation:
For WaterTemperature:
P-value= 0.575
So, P-value< α%
Since P-value is greater than the level of significance , this indicates that there is not significant
evidence to reject the null hypothesis. This suggests that there is significant difference between
water temperature.
For Detergent:
26
P-value= 0.067
So, P-value< α%
Since P-value is greater than the level of significance , this indicates that there is not significant
evidence to reject the null hypothesis. This suggests that there is significant difference between
detergents.
Question 11
1 92 1 24 120 80 2 2 3 2
2 65 1 20 90 80 1 2 2 2
3 85 2 17 140 95 1 1 1 2
4 65 2 25 140 100 1 1 1 2
5 64 2 26 150 90 1 1 1 2
6 85 1 17 120 70 1 1 2 2
7 76 1 21 110 80 1 1 2 2
8 65 2 22 110 75 1 1 3 2
9 78 1 23 90 60 1 1 2 2
10 60 2 20 130 80 2 2 1 2
11 66 1 28 120 60 1 2 3 2
12 65 2 30 130 80 1 2 1 2
13 69 2 25 120 80 1 2 2 2
14 61 2 24 140 90 1 2 1 1
15 67 2 26 120 80 1 2 1 2
16 68 2 20 120 80 1 2 1 1
17 80 1 25 120 80 1 2 2 2
18 65 1 27 110 70 1 2 3 2
19 65 2 29 130 80 1 2 1 2
20 70 2 30 100 60 1 2 1 2
27
Calculate the descriptive statistics and make pie-chart and box-plots of the following data.
Solution:
Working Expression:
Statistics
N Valid 20 20 20 20
Missing 0 0 0 0
Mean 70.55 23.95 120.50 78.50
Median 66.50 24.50 120.00 80.00
Mode 65 20a 120 80
Std. Deviation 8.959 3.927 16.051 10.773
Variance 80.261 15.418 257.632 116.053
Skewness 1.123 -.196 -.259 -.129
Std. Error of Skewness .512 .512 .512 .512
Kurtosis .270 -.740 -.101 .134
Std. Error of Kurtosis .992 .992 .992 .992
Minimum 60 17 90 60
Maximum 92 30 150 100
28
The below shown image is the boxplots of bmi under habit of using alcohol:
29
The below shown image is the boxplots of blood pressure:
Statistics
category
N Valid 20
Missing 0
Mean 1.60
Std. Error of Mean .210
Median 1.00
Mode 1
Std. Deviation .940
Variance .884
Skewness 1.367
Std. Error of Skewness .512
Kurtosis .754
Std. Error of Kurtosis .992
Range 3
Minimum 1
Maximum 4
Sum 32
Percentiles .45 1.00
.95 1.00
25 1.00
50 1.00
75 2.00
30
category
Cumulative
Frequency Percent Valid Percent Percent
31