[go: up one dir, main page]

0% found this document useful (0 votes)
10 views31 pages

Statistics Lab

Lab report of statistics and probability using software.

Uploaded by

shristirai348
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views31 pages

Statistics Lab

Lab report of statistics and probability using software.

Uploaded by

shristirai348
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Table of Contents

S.N Title Date Signaturre

1. Calculate the descriptive statistics of the 2081-12-25


data.

2. Determine the least square equation that 2081-12-25


best describe the variable, calculate the
standard error, test the significance of
regression coefficient and overall fit of the
regression equation, conduct the residual
analysis , and estimate the dependent
variable when independent variable is 35

3. Calculate karl pearson's correlation 2081-12-25


coefficient and probable error of the data and
interpret the result.

4. Determine association between band width 2081-12-25


and data rate, fit the regression model to
describe the given data and also interpret the
estimated the data rate when band width is
30 and percentage of variation on the data
rate is explained by the variation on band
width?

Calculation of one way Anova to the given 2081-12-25


data.
5.

6. Calculation of two way Anova to the given 2081-12-25


data.

7. There are three brand of computer namely A 2081-12-25


, B, C and their life time is tabulated as
follows. Test whether the average lifetime of
three brands of computer is significantly
different at 5% level of significant.

1
8. Find descriptive statistics, frequency table, 2081-12-25
box graph, pie-chart.

9. Perform a two way analysis of variance 2081-12-25


using the level of significance at 0.05.

10. The following table gives the data on the 2081-12-25


perform of three different detergents at three
different water temperature. The
performance was obtained on the whiteness
reading based on specially designed
equipments for nine loads of washing.
Perform a two way analysis of variance
using the level of significance at0.05

11. Calculate the descriptive statistics and make 2081-12-25


pie-chart and box-plots.

2
SPSS Software
Introduction

SPSS, which stands for "Statistical Package for the Social Sciences," is a software program
used for statistical analysis and data management. It was originally developed by IBM
(International Business Machines Corporation) and is widely used in various fields, including
social sciences, business, healthcare, and academic research.

SPSS provides a user-friendly interface that allows researchers and analysts to perform a wide
range of statistical analyses, data manipulation, and reporting without the need for advanced
programming skills. Some of the key features and capabilities of SPSS include:

 Data Entry and Management: SPSS allows users to input, edit, and manage data
efficiently. It supports various data types, including numerical, categorical, and text
data.
 Data Analysis: SPSS offers a comprehensive set of statistical tools for descriptive
statistics, hypothesis testing, regression analysis, correlation analysis, factor analysis,
and more.
 Data Visualization: The software provides various data visualization options, including
charts, graphs, and plots, to help users understand and present their data effectively.
 Reporting: SPSS allows users to generate customized reports and tables that summarize
the results of their analyses. These reports can be easily exported for further use or
publication.
 Syntax Language: Advanced users can take advantage of SPSS syntax language to
automate and replicate analyses, making it a powerful tool for reproducible research.
 Integration: SPSS can integrate with other software and data sources, enabling users to
import and export data from different formats and platforms.
 Advanced Analytics: In addition to basic statistical analyses, SPSS offers advanced
analytical capabilities, including predictive modeling and machine learning techniques.

SPSS is commonly used in academic research, market research, survey analysis, healthcare
research, and various other fields where data analysis and statistical interpretation are essential.
While it has been widely adopted for its user-friendly interface, it also provides more advanced
features for experienced statisticians and analysts.

3
Questions 1.

Find confidence interval of mean assuming normal distribution for following data height.

78 55 68 48 65 76 57 55 65 75 51 61 68 67 76 78 71 56 57 67 58 51 50 58 50
77 55 48 70 55 58 70 56 52 74 61 69 76 61 68 78 56 78 57 66 66 74 66 48 73
71 70 62 74 76 50 69 75 65 48

Also calculate the descriptive statistics of the data.

Solution:

Working Expression:

The confidence interval (CI) is a range of values that’s likely to include a population value with
a certain degree of confidence. It is often expressed as a % whereby a population mean lies
between an upper and lower interval.

The confidence interval of mean assuming normal distribution of given data is as follow:

Question 2.

A developer of food for pig wish to determine what relationship exists among age of a pig
when it starts receiving a newly developed food supplement, the initial weight of the pig and
the amount of weight it gain in a week period with the food supplement. The following
information is the result of study of eight piglets.

4
Piglet initial weight (x1) initial age (x2) weight gain (Y)

1 39 8 7

2 52 6 6

3 49 7 8

4 46 12 10

5 61 9 9

6 35 6 5

7 25 7 3

8 55 4 4

a) Determine the least square equation that best describe the variable.

b) Calculate the standard error

c) Test the significance of regression coefficient and overall fit of the regression equation

d) Conduct the residual analysis

e) Estimate the dependent variable when independent variable is 35

Solution:

Working Expression:

The least square equation determination is the determination of regression equation. Regression
is the method which measures the average relationship between two or more variables in terms
of the original units of the data set.

The regression equation Y on X is given by

Y= a+ bX where a is constant, b is regression coefficient, Y is dependent and X is


independent variable.

The regression equation X on Y is given by

X = a +bY where a is constant, b is regression coefficient, Y is independent and X is dependent


variable.

The above given question uses multiple regression technique. Multiple regression was created
for cases in which there are three or more variables. Here in the given question we have one
dependent variable and other two are independent variables.

5
Coefficientsa

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) -4.192 1.888 -2.220 .077

initial weight(x1) .105 .032 .501 3.247 .023

initial age(x2) .807 .158 .786 5.097 .004

a. Dependent Variable: weight gain(y)

The least square equation that best describe the variable:

Y= a+b1X1+b2X2

Y=-4.912+0.105X1+0.807X2

When X1 variable is constant, Y variable is dependent on X2 variable and while X2 variable


is constant Y is dependent on X1 variable.

From the above table the regression equation weight gain on initial weight is:

Y= 0.105X1- 4.192

Also the regression equation weight gain on initial age is:

Y= 0.807X2- 4.192

The least-squares line always passes through the point ( , ).

Model Summaryb

Adjusted R Std. Error of the


Model R R Square Square Estimate

1 .939a .881 .834 .999


a. Predictors: (Constant), initial age(x2), initial weight(x1)
b. Dependent Variable: weight gain(y)

1− 𝑟 2
Standard Error (S.E) =
√𝑛

1−0.881
S.E=
√8

= 0.042

6
ANOVAa

Model Sum of Squares df Mean Square F Sig.

1 Regression 37.009 2 18.505 18.539 .005b

Residual 4.991 5 .998

Total 42.000 7

a. Dependent Variable: weight gain(y)


b. Predictors: (Constant), initial age(x2), initial weight(x1)

Let us consider the null and alternate hypothesis as follows:

Null hypothesis(H0): There is no significance difference between two variables.

Alt. hypothesis(H1): There is a significance difference between two variables.

Interpretation:
Significance:

0.005<0.05

Since the p-value (0.005) is lesser than a conventional significance level like 0.05, there is not
sufficient evidence to reject the null hypothesis. In other words, there is no significance
difference between variables

Residuals Statisticsa

Minimum Maximum Mean Std. Deviation N

Predicted Value 4.07 10.31 6.50 2.299 8


Residual -1.075 1.409 .000 .844 8
Std. Predicted Value -1.055 1.656 .000 1.000 8
Std. Residual -1.076 1.411 .000 .845 8

ANOVAa

Model Sum of Squares df Mean Square F Sig.

1 Regression 37.009 2 18.505 18.539 .005b

Residual 4.991 5 .998

Total 42.000 7

a. Dependent Variable: weight gain(y)


b. Predictors: (Constant), initial age(x2), initial weight(x1)
a. Dependent Variable: weight gain(y)

Interpretation:
Residual:= (𝑦 − y^)2 = SSE

7
Interpretation:
In this plot, three data are in the marginal line while other data are deviated from the average
indicating outlier. If the greater number of data fit in the average point, then it indicates the
accuracy. Change in height and weight affects the deviation. There is small number of deviation
in the plot.

Composition of weight gain

8
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta
1 (Constant) -4.192 1.888 -2.220 .077
initial weight(x1) .105 .032 .501 3.247 .023
initial age(x2) .807 .158 .786 5.097 .004
a. Dependent Variable: weight gain(y)

Estimating the dependent variable when independent variable is 35,

The regression coefficients of Y on X is given by,

Y= a+b1X1+b2X2

Y=-4.912+0.105X1+0.807X2

When X1=35, X2=35,

Y=4.912+0.105*35+0.807*35

=36.832

The regression coefficients of Y on X when independent variable is 35,

Y=36.832

Question3.

Calculate karl pearson's correlation coefficient from the following data.

Sales 43 41 36 34 50

expenses 10 22 13 19 17

Also calculate probable error of the data and interpret the result.

Solution:

Working Expression:

Karl Pearson’s coefficient of correlation is an extensively used mathematical method in which


the numerical representation is applied to measure the level of relation between linearly related
variables.The coefficient of correlation is expressed by “r”.

9
Pearson’s r varies between +1 and -1, where

r = 0, there is zero correlation between sales and expenses.

r = +1, there is perfecty positive correlation between sales and expenses.

r = -1, there is perfectly negative correlation between sales and expenses.


1− 𝑟 2
Probable error(P.E)= 0.675 *
√𝑛

If |r|<P.E(r); there is no evidence of correlation (the correlation coefficient is insignificant).

If |r|>6*P.E(r); there is evidence of correlation (the correlation coefficient is significant).

Let us consider the null and alternate hypothesis as follows:

Null hypothesis(H0): There is no significance difference between two variables.

Alt. hypothesis(H1): There is a significance difference between two variables.

Correlation:
sales expenses
sales Pearson
1 -.073
Correlation
Sig. (2-tailed) .907
N 5 5
expenses Pearson
-.073 1
Correlation
Sig. (2-tailed) .907
N 5 5

Interpretation:
Correlation = -.073;

There is low degree of negative correlation coefficient of sales and expenses.

10
Significance:

0.907>0.05

Since the p-value (0.907) is much greater than a conventional significance level like 0.05, we
typically fail to reject the null hypothesis. In other words, there is significance difference
between sales and expenses.
1− 𝑟 2
Probable error(P.E)= 0.675 *
√𝑛

P.E= 0.3

|r|=0.073

Here, |r|<P.E(r)

Since probable error of correlation coefficient is greater than absolute value of correlation
coefficient. Hence the correlation coefficient of sales and expenses is insignificant.

Question 4.

A computer operator is interested to know how data rate of internet users depend the band
width, the following result were gathered by the operator.

Band 17 35 41 19 25 20 10 15

Width

Data 47 64 68 50 60 55 30 33

rate

a) Is there any association between band width and data rate?


b) Fit the regression model to describe the given data and also interpret the estimated the
data rate when band width is 30 ?
c) What percentage of variation on the data rate is explained by the variation on band
width?

Solution:

Working Expressions:

Karl Pearson’s coefficient of correlation is an extensively used mathematical method in


which the numerical representation is applied to measure the level of relation between

11
linearly related variables The coefficient of correlation is expressed by “r”.

Pearson’s r varies between +1 and -1, where

r = 0, there is zero correlation between sales and expenses.

r = +1, there is perfecty positive correlation between sales and expenses.

r = -1, there is perfectly negative correlation between sales and expenses.

Let us consider the null and alternate hypothesis as follows:

Null hypothesis(H0): There is no significance difference between two variables.

Alt. hypothesis(H1): There is a significance difference between two variables.

Regression:

The regression equation Y on X is given by

Y= a+ bX where a is constant, b is regression coefficient, Y is dependent and X is


independent variable.

The regression equation X on Y is given by

X = a +bY where a is constant, b is regression coefficient, Y is independent and X is dependent


variable.

Correlations
band
width data rate
band width Pearson
1 .902**
Correlation
Sig. (2-tailed) .002
N 8 8
data rate Pearson
.902** 1
Correlation
Sig. (2-tailed) .002
N 8 8

12
**. Correlation is significant at the 0.01 level (2-tailed).
Interpretation:
Finding association between bandwidth and data rate:

Correlation = 0.0902;

There is high degree of positive correlation coefficient of band width and data rate.

Significance:

0.002<0.05

Since the p-value (0.002) is lesser than a conventional significance level like 0.05, there is not
sufficient evidence to reject the null hypothesis. In other words, there is no significance
difference between band width and data rate.

Model Summaryb

Adjusted R Std. Error of


Model R R Square Square the Estimate
a
1 .902 .814 .783 6.436
a. Predictors: (Constant), band width
b. Dependent Variable: data rate
Interpretation:

Coefficient of determination(𝑟 2 ) = 0.814

Hence only 81.4% of the variation of data rate has been explained by variation of band width

under given information.

13
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta T Sig.
1 (Constant) 23.749 5.761 4.123 .006
band width 1.192 .233 .902 5.126 .002
a. Dependent Variable: data rate
Interpretation:
The regression equation Y on X is given by

Y= a+ bX

where a is constant, b is regression coefficient, Y is dependent and X is independent variable.

Y= 23.749 + 1.192 X

The estimated regression equation Y on X is given is,

Y= 23.749 + 1.192 X

When X=30,

Y= 23.749 + 1.192 * 30

= 59.509

Composition of band width

14
Composition of data rate

Question 5

The yield of treatment in different plots is as shown in the following plots. Carry out analysis

t4 1401 T3 2536 T3 2459 T1 2537 T3 2827 T1 2069

t2 2211 T1 1797 T4 1170 T4 1516 T4 2104 T3 2385

t2 3366 T1 2104 T2 2591 T3 2460 T4 1077 T2 2544

Solution:

Working Expression:

The problem can be solved by using one way analysis of variance. Let’s consider the null and
alternative hypothesis:

H0 : µ1= µ2(There is no significance difference)

H1 : µ1≠ µ2(Not all means are same at least one mean is different)

Formula:

Grand Total(T)= ∑X1+ ∑X2+ ∑X3

Correction factor(CF)= T^2/N

Sum of square(TSS)= ∑X1^2 + ∑X2^2 –CF

Sum of square between sample (SSB)= [ (∑X1)^2/n + (∑X2)^2/n ]-CF

15
Error Sum of square(SSE) = TSS- SSB

ANOVA
values
Sum of
Squares df Mean Square F Sig.
Between
4265689.961 3 1421896.654 11.253 .001
Groups
Within Groups 1768941.150 14 126352.939
Total 6034631.111 17

Interpretation:

Confidence interval (1- α %)= 95%

Level of significance(α%) = 5% =0.05

P-value= 0.001

So, P-value< α%

Since P-value is lesser than the level of significance , we typically fail to reject the null
hypothesis. In other words, there is significance difference between treatments.

Question 6

The following table gives of the result of the experiment on four varieties of a crop in 5 block
of polt.

Treatment Block1 Block2 Block3 Block4 Block5

A 32 34 33 35 37

B 34 33 36 37 35

C 31 34 35 32 36

D 29 26 30 28 29

Solution:

Working Expression:

The two-way ANOVA compares the mean differences between groups that have been split on
two independent variables (called factors). The primary purpose of a two-way ANOVA is to

16
understand if there is an interaction between the two independent variables on the dependent
variable.

Let’s consider the null and alternative hypothesis for row:

H0 : µ1= µ2 =µ3= µ4 (There is no significance difference)

H1 : µ1≠ µ2 =µ3≠ µ4 (Not all means are same at least one mean is different)

Let’s consider the null and alternative hypothesis for column:

H0 : µ1= µ2 =µ3= µ4 = 5(There is no significance difference)

H1 : µ1≠ µ2 =µ3≠ µ4 ≠ µ5(Not all means are same at least one mean is different)

Formula:

Grand Total(T)= ∑C1+ ∑C2+ ∑C3+ ∑C4+ ∑C5

Correction factor(CF)= T^2/N

Sum of square(TSS)= ∑C1^2 + ∑C2^2+ ∑C3^2+ ∑C4^2 +∑C5^2 –CF

Sum of square due to row(SSR)= [ (∑R1)^2/n + (∑R2)^2/n+ ∑R3)^2/n + (∑R4)^2/n]-CF

Sum of square due to column(SSC)= [ (∑C1)^2/n + (∑C2)^2/n+ ∑C3)^2/n + (∑C4)^2/n+


(∑C5)^2/n ]-CF

Error Sum of square(SSE) = TSS- SSR- SSC

Tests of Between-Subjects Effects


Dependent Variable: value
Type III Sum
Source of Squares df Mean Square F Sig.
Corrected
155.700a 7 22.243 9.048 .001
Model
Intercept 21516.800 1 21516.800 8752.597 .000
Treatments 134.000 3 44.667 18.169 .000
Block 21.700 4 5.425 2.207 .130
Error 29.500 12 2.458
Total 21702.000 20
Corrected Total 185.200 19
a. R Squared = .841 (Adjusted R Squared = .748)

17
Interpretation:

For treatment:

Confidence interval (1- α %)= 95%

Level of significance(α%) = 5% =0.05

P-value= 0.00

So, P-value< α%

Since P-value is lesser than the level of significance , we typically fail to reject the null
hypothesis. In other words, there is significance difference between treatments.

For Block:

Confidence interval (1- α %)= 95%

Level of significance(α%) = 5% =0.05

P-value= 0.00

So, P-value< α%

Since P-value is lesser than the level of significance , we typically fail to reject the null
hypothesis. In other words, there is significance difference between blocks.

Question 7

There are three brand of computer namely A , B, C and their life time is tabulated as follows.
Test whether the average lifetime of three brands of computer is significantly different at 5%
level of significant.

A 5 7 3 2 6 4 8 9

B 2 3 4 5 6

C 7 8 9 10 12 13 11

Solution:

Working Expression:

The problem can be solved by using one way analysis of variance. Let’s consider the null and
alternative hypothesis:

H0 : µ1= µ2(There is no significance difference)

H1 : µ1≠ µ2(Not all means are same at least one mean is different)

18
Formula:

Grand Total(T)= ∑X1+ ∑X2+ ∑X3

Correction factor(CF)= T^2/N

Sum of square(TSS)= ∑X1^2 + ∑X2^2 –CF

Sum of square between sample (SSB)= [ (∑X1)^2/n + (∑X2)^2/n ]-CF

Error Sum of square(SSE) = TSS- SSB

ANOVA
values of brand
Sum of
Squares df Mean Square F Sig.
Between
124.200 2 62.100 13.196 .000
Groups
Within Groups 80.000 17 4.706
Total 204.200 19

Interpretation:

Confidence interval (1- α %)= 95%

Level of significance(α%) = 5% =0.05

P-value= 0.00

So, P-value< α%

Since P-value is lesser than the level of significance , we typically fail to reject the null
hypothesis. In other words, there is significance difference between brands.

Question 8

Roll number name age weight gender Education


level

1 Ram 25 65 1 2

2 Shyam 22 55 1 3

3 Sita 23 43 2 3

4 Hari 21 45 1 1

19
5 Rita 24 40 2 2

6 Roshan 21 43 1 3

7 Sabita 21 48 2 3

8 Diya 25 52 2 2

9 depak 19 41 1 1

Solution:

Working expression:

 Mean:

Formula: Mean (μ) = (Sum of all values) / (Number of values)

 Median:

Formula: Median = Middle value (for an odd number of values) or (Sum of two middle values)
/ 2 (for an even number of values)

 Mode:

Formula: Mode = Value(s) that occur(s) most frequently in the dataset

 Maximum:

Formula: Maximum = Largest value in the dataset

 Minimum:

Formula: Minimum = Smallest value in the dataset

 Standard Deviation:

Formula: σ (sigma) = √(Σ(xi - μ)² / N)

 Percentile:

Formula: Percentile Rank = (Number of values below the data point) / (Total number of values)
* 10

20
Statistics
weight of the age of the
student student
N Valid 9 9
Missing 1 1
Mean 48.00 22.33
Std. Error of Mean 2.703 .687
Median 45.00 22.00
Mode 43 21
Std. Deviation 8.109 2.062
Variance 65.750 4.250
Skewness 1.262 -.024
Std. Error of Skewness .717 .717
Kurtosis 1.253 -.983
Std. Error of Kurtosis 1.400 1.400
Range 25 6
Minimum 40 19
Maximum 65 25
Sum 432 201
Percentiles .35 40.00 19.00
.65 40.00 19.00
25 42.00 21.00
50 45.00 22.00
75 53.50 24.50

weight of the student


Cumulative
Frequency Percent Valid Percent Percent
Valid 40 1 10.0 11.1 11.1
41 1 10.0 11.1 22.2
43 2 20.0 22.2 44.4
45 1 10.0 11.1 55.6
48 1 10.0 11.1 66.7
52 1 10.0 11.1 77.8
55 1 10.0 11.1 88.9
65 1 10.0 11.1 100.0
Total 9 90.0 100.0
Missing System 1 10.0
Total 10 100.0

21
age of the student
Frequenc Valid Cumulative
y Percent Percent Percent
Valid 19 1 10.0 11.1 11.1
21 3 30.0 33.3 44.4
22 1 10.0 11.1 55.6
23 1 10.0 11.1 66.7
24 1 10.0 11.1 77.8
25 2 20.0 22.2 100.0
Total 9 90.0 100.0
Missing System 1 10.0
Total 10 100.0

Composition of weight and age

22
The below shown image is the pie chart which shows the different education level:

23
Question 9

Carry out the analysis of variance of the following data

Type C1 C2 C3

83 56 79

83 76 95

76 72 87

Test whether the average cost per computer is significantly different among three type of the
computer at 5% level of significance.

Solution:

Working Expression:

The problem can be solved by using one way analysis of variance. Let’s consider the null and
alternative hypothesis:

H0 : µ1= µ2(There is no significance difference)

H1 : µ1≠ µ2(Not all means are same at least one mean is different)

Formula:

Grand Total(T)= ∑X1+ ∑X2+ ∑X3

Correction factor(CF)= T^2/N

Sum of square(TSS)= ∑X1^2 + ∑X2^2 –CF

Sum of square between sample (SSB)= [ (∑X1)^2/n + (∑X2)^2/n ]-CF

Error Sum of square(SSE) = TSS- SSB

ANOVA
values
Sum of
Squares df Mean Square F Sig.
Between
561.556 2 280.778 4.380 .067
Groups
Within Groups 384.667 6 64.111
Total 946.222 8

Interpretation:

Confidence interval (1- α %)= 95%

24
Level of significance(α%) = 5% =0.05

P-value= 0.067

So, P-value>α%

Since P-value is greater than the level of significance , this indicates that there is not significant
evidence to reject the null hypothesis. This suggests that there is significant difference between
cost per computer.

Question 10

Water temperature Detergent A B C

Cold 45 43 55

Warm 37 40 56

Hot 42 44 46

The following table gives the data on the perform of three different detergents at three different
water temperature. The performance was obtained on the whiteness reading based on specially
designed equipments for nine loads of washing.

Perform a two way analysis of variance using the level of significance at0.05

Solution:

Working Expression:

The two-way ANOVA compares the mean differences between groups that have been split on
two independent variables (called factors). The primary purpose of a two-way ANOVA is to
understand if there is an interaction between the two independent variables on the dependent
variable.

Let’s consider the null and alternative hypothesis for row:

H0 : µ1= µ2 =µ3= µ4 (There is no significance difference)

H1 : µ1≠ µ2 =µ3≠ µ4 (Not all means are same at least one mean is different)

Let’s consider the null and alternative hypothesis for column:

H0 : µ1= µ2 =µ3= µ4 = 5(There is no significance difference)

H1 : µ1≠ µ2 =µ3≠ µ4 ≠ µ5(Not all means are same at least one mean is different)

Formula:
25
Grand Total(T)= ∑C1+ ∑C2+ ∑C3+ ∑C4+ ∑C5

Correction factor(CF)= T^2/N

Sum of square(TSS)= ∑C1^2 + ∑C2^2+ ∑C3^2+ ∑C4^2 +∑C5^2 –CF

Sum of square due to row(SSR)= [ (∑R1)^2/n + (∑R2)^2/n+ ∑R3)^2/n + (∑R4)^2/n]-CF

Sum of square due to column(SSC)= [ (∑C1)^2/n + (∑C2)^2/n+ ∑C3)^2/n + (∑C4)^2/n+


(∑C5)^2/n ]-CF

Error Sum of square(SSE) = TSS- SSR- SSC

Tests of Between-Subjects Effects


Dependent Variable: values
Type III Sum
Source of Squares df Mean Square F Sig.
a
Corrected Model 246.667 4 61.667 3.190 .144
Intercept 18496.000 1 18496.000 956.690 .000
WaterTemperatur
24.667 2 12.333 .638 .575
e
Detergent 222.000 2 111.000 5.741 .067
Error 77.333 4 19.333
Total 18820.000 9
Corrected Total 324.000 8
a. R Squared = .761 (Adjusted R Squared = .523)

Interpretation:

For WaterTemperature:

Confidence interval (1- α %)= 95%

Level of significance(α%) = 5% =0.05

P-value= 0.575

So, P-value< α%

Since P-value is greater than the level of significance , this indicates that there is not significant
evidence to reject the null hypothesis. This suggests that there is significant difference between
water temperature.

For Detergent:

Confidence interval (1- α %)= 95%

Level of significance(α%) = 5% =0.05

26
P-value= 0.067

So, P-value< α%

Since P-value is greater than the level of significance , this indicates that there is not significant
evidence to reject the null hypothesis. This suggests that there is significant difference between
detergents.

Question 11

sn age gender bmi bp_sy bp_dy occu lit smoke alco

1 92 1 24 120 80 2 2 3 2

2 65 1 20 90 80 1 2 2 2

3 85 2 17 140 95 1 1 1 2

4 65 2 25 140 100 1 1 1 2

5 64 2 26 150 90 1 1 1 2

6 85 1 17 120 70 1 1 2 2

7 76 1 21 110 80 1 1 2 2

8 65 2 22 110 75 1 1 3 2

9 78 1 23 90 60 1 1 2 2

10 60 2 20 130 80 2 2 1 2

11 66 1 28 120 60 1 2 3 2

12 65 2 30 130 80 1 2 1 2

13 69 2 25 120 80 1 2 2 2

14 61 2 24 140 90 1 2 1 1

15 67 2 26 120 80 1 2 1 2

16 68 2 20 120 80 1 2 1 1

17 80 1 25 120 80 1 2 2 2

18 65 1 27 110 70 1 2 3 2

19 65 2 29 130 80 1 2 1 2

20 70 2 30 100 60 1 2 1 2

27
Calculate the descriptive statistics and make pie-chart and box-plots of the following data.

Solution:

Working Expression:

Descriptive statistic is a summary statistic that quantitatively describes or summarizes features


from a collection of information, while descriptive statistics is the process of using and
analysing those statistic.

Statistics

body mass systolic blood diastollic blood


age index pressure pressure

N Valid 20 20 20 20

Missing 0 0 0 0
Mean 70.55 23.95 120.50 78.50
Median 66.50 24.50 120.00 80.00
Mode 65 20a 120 80
Std. Deviation 8.959 3.927 16.051 10.773
Variance 80.261 15.418 257.632 116.053
Skewness 1.123 -.196 -.259 -.129
Std. Error of Skewness .512 .512 .512 .512
Kurtosis .270 -.740 -.101 .134
Std. Error of Kurtosis .992 .992 .992 .992
Minimum 60 17 90 60
Maximum 92 30 150 100

a. Multiple modes exist. The smallest value is shown

The below shown image is the pie chart:

28
The below shown image is the boxplots of bmi under habit of using alcohol:

The below shown image is the boxplots of bmi under gender:

29
The below shown image is the boxplots of blood pressure:

Statistics
category

N Valid 20

Missing 0
Mean 1.60
Std. Error of Mean .210
Median 1.00
Mode 1
Std. Deviation .940
Variance .884
Skewness 1.367
Std. Error of Skewness .512
Kurtosis .754
Std. Error of Kurtosis .992
Range 3
Minimum 1
Maximum 4
Sum 32
Percentiles .45 1.00

.95 1.00

25 1.00

50 1.00
75 2.00

30
category

Cumulative
Frequency Percent Valid Percent Percent

Valid 60-69 13 65.0 65.0 65.0

70-79 3 15.0 15.0 80.0

80-89 3 15.0 15.0 95.0

90-99 1 5.0 5.0 100.0

Total 20 100.0 100.0

31

You might also like