0% found this document useful (0 votes)

10 views31 pages

Statistics Lab

Lab report of statistics and probability using software.

Uploaded by

shristirai348

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views31 pages

Statistics Lab

Lab report of statistics and probability using software.

Uploaded by

shristirai348

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

S.N Title Date Signaturre

1. Calculate the descriptive statistics of the 2081-12-25

data.

2. Determine the least square equation that 2081-12-25

best describe the variable, calculate the
standard error, test the significance of
regression coefficient and overall fit of the
regression equation, conduct the residual
analysis , and estimate the dependent
variable when independent variable is 35

3. Calculate karl pearson's correlation 2081-12-25

coefficient and probable error of the data and
interpret the result.

4. Determine association between band width 2081-12-25

and data rate, fit the regression model to
describe the given data and also interpret the
estimated the data rate when band width is
30 and percentage of variation on the data
rate is explained by the variation on band
width?

Calculation of one way Anova to the given 2081-12-25

data.
5.

6. Calculation of two way Anova to the given 2081-12-25

data.

7. There are three brand of computer namely A 2081-12-25

, B, C and their life time is tabulated as
follows. Test whether the average lifetime of
three brands of computer is significantly
different at 5% level of significant.

1
8. Find descriptive statistics, frequency table, 2081-12-25
box graph, pie-chart.

9. Perform a two way analysis of variance 2081-12-25

using the level of significance at 0.05.

10. The following table gives the data on the 2081-12-25

perform of three different detergents at three
different water temperature. The
performance was obtained on the whiteness
reading based on specially designed
equipments for nine loads of washing.
Perform a two way analysis of variance
using the level of significance at0.05

11. Calculate the descriptive statistics and make 2081-12-25

pie-chart and box-plots.

2
SPSS Software
Introduction

SPSS, which stands for "Statistical Package for the Social Sciences," is a software program
used for statistical analysis and data management. It was originally developed by IBM
(International Business Machines Corporation) and is widely used in various fields, including
social sciences, business, healthcare, and academic research.

SPSS provides a user-friendly interface that allows researchers and analysts to perform a wide
range of statistical analyses, data manipulation, and reporting without the need for advanced
programming skills. Some of the key features and capabilities of SPSS include:

 Data Entry and Management: SPSS allows users to input, edit, and manage data
efficiently. It supports various data types, including numerical, categorical, and text
data.
 Data Analysis: SPSS offers a comprehensive set of statistical tools for descriptive
statistics, hypothesis testing, regression analysis, correlation analysis, factor analysis,
and more.
 Data Visualization: The software provides various data visualization options, including
charts, graphs, and plots, to help users understand and present their data effectively.
 Reporting: SPSS allows users to generate customized reports and tables that summarize
the results of their analyses. These reports can be easily exported for further use or
publication.
 Syntax Language: Advanced users can take advantage of SPSS syntax language to
automate and replicate analyses, making it a powerful tool for reproducible research.
 Integration: SPSS can integrate with other software and data sources, enabling users to
import and export data from different formats and platforms.
 Advanced Analytics: In addition to basic statistical analyses, SPSS offers advanced
analytical capabilities, including predictive modeling and machine learning techniques.

SPSS is commonly used in academic research, market research, survey analysis, healthcare
research, and various other fields where data analysis and statistical interpretation are essential.
While it has been widely adopted for its user-friendly interface, it also provides more advanced
features for experienced statisticians and analysts.

3
Questions 1.

Find confidence interval of mean assuming normal distribution for following data height.

78 55 68 48 65 76 57 55 65 75 51 61 68 67 76 78 71 56 57 67 58 51 50 58 50
77 55 48 70 55 58 70 56 52 74 61 69 76 61 68 78 56 78 57 66 66 74 66 48 73
71 70 62 74 76 50 69 75 65 48

Also calculate the descriptive statistics of the data.

Solution:

Working Expression:

The confidence interval (CI) is a range of values that’s likely to include a population value with
a certain degree of confidence. It is often expressed as a % whereby a population mean lies
between an upper and lower interval.

The confidence interval of mean assuming normal distribution of given data is as follow:

Question 2.

A developer of food for pig wish to determine what relationship exists among age of a pig
when it starts receiving a newly developed food supplement, the initial weight of the pig and
the amount of weight it gain in a week period with the food supplement. The following
information is the result of study of eight piglets.

4
Piglet initial weight (x1) initial age (x2) weight gain (Y)

1 39 8 7

2 52 6 6

3 49 7 8

4 46 12 10

5 61 9 9

6 35 6 5

7 25 7 3

8 55 4 4

a) Determine the least square equation that best describe the variable.

b) Calculate the standard error

c) Test the significance of regression coefficient and overall fit of the regression equation

d) Conduct the residual analysis

e) Estimate the dependent variable when independent variable is 35

Solution:

Working Expression:

The least square equation determination is the determination of regression equation. Regression
is the method which measures the average relationship between two or more variables in terms
of the original units of the data set.

The regression equation Y on X is given by

Y= a+ bX where a is constant, b is regression coefficient, Y is dependent and X is

independent variable.

The regression equation X on Y is given by

X = a +bY where a is constant, b is regression coefficient, Y is independent and X is dependent

variable.

The above given question uses multiple regression technique. Multiple regression was created
for cases in which there are three or more variables. Here in the given question we have one
dependent variable and other two are independent variables.

5
Coefficientsa

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) -4.192 1.888 -2.220 .077

initial weight(x1) .105 .032 .501 3.247 .023

initial age(x2) .807 .158 .786 5.097 .004

a. Dependent Variable: weight gain(y)

The least square equation that best describe the variable:

Y= a+b1X1+b2X2

Y=-4.912+0.105X1+0.807X2

When X1 variable is constant, Y variable is dependent on X2 variable and while X2 variable

is constant Y is dependent on X1 variable.

From the above table the regression equation weight gain on initial weight is:

Y= 0.105X1- 4.192

Also the regression equation weight gain on initial age is:

Y= 0.807X2- 4.192

The least-squares line always passes through the point ( , ).

Model Summaryb

Adjusted R Std. Error of the

Model R R Square Square Estimate

1 .939a .881 .834 .999

a. Predictors: (Constant), initial age(x2), initial weight(x1)
b. Dependent Variable: weight gain(y)

1− 𝑟 2
Standard Error (S.E) =
√𝑛

1−0.881
S.E=
√8

= 0.042

6
ANOVAa

Model Sum of Squares df Mean Square F Sig.

1 Regression 37.009 2 18.505 18.539 .005b

Residual 4.991 5 .998

Total 42.000 7

a. Dependent Variable: weight gain(y)

b. Predictors: (Constant), initial age(x2), initial weight(x1)

Let us consider the null and alternate hypothesis as follows:

Null hypothesis(H0): There is no significance difference between two variables.

Alt. hypothesis(H1): There is a significance difference between two variables.

Interpretation:
Significance:

0.005<0.05

Since the p-value (0.005) is lesser than a conventional significance level like 0.05, there is not
sufficient evidence to reject the null hypothesis. In other words, there is no significance
difference between variables

Residuals Statisticsa

Minimum Maximum Mean Std. Deviation N

Predicted Value 4.07 10.31 6.50 2.299 8

Residual -1.075 1.409 .000 .844 8
Std. Predicted Value -1.055 1.656 .000 1.000 8
Std. Residual -1.076 1.411 .000 .845 8

ANOVAa

Model Sum of Squares df Mean Square F Sig.

1 Regression 37.009 2 18.505 18.539 .005b

Residual 4.991 5 .998

Total 42.000 7

a. Dependent Variable: weight gain(y)

b. Predictors: (Constant), initial age(x2), initial weight(x1)
a. Dependent Variable: weight gain(y)

Interpretation:
Residual:= (𝑦 − y^)2 = SSE

7
Interpretation:
In this plot, three data are in the marginal line while other data are deviated from the average
indicating outlier. If the greater number of data fit in the average point, then it indicates the
accuracy. Change in height and weight affects the deviation. There is small number of deviation
in the plot.

Composition of weight gain

8
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta
1 (Constant) -4.192 1.888 -2.220 .077
initial weight(x1) .105 .032 .501 3.247 .023
initial age(x2) .807 .158 .786 5.097 .004
a. Dependent Variable: weight gain(y)

Estimating the dependent variable when independent variable is 35,

The regression coefficients of Y on X is given by,

Y= a+b1X1+b2X2

Y=-4.912+0.105X1+0.807X2

When X1=35, X2=35,

Y=4.912+0.105*35+0.807*35

=36.832

The regression coefficients of Y on X when independent variable is 35,

Y=36.832

Question3.

Calculate karl pearson's correlation coefficient from the following data.

Sales 43 41 36 34 50

expenses 10 22 13 19 17

Also calculate probable error of the data and interpret the result.

Solution:

Working Expression:

Karl Pearson’s coefficient of correlation is an extensively used mathematical method in which

the numerical representation is applied to measure the level of relation between linearly related
variables.The coefficient of correlation is expressed by “r”.

9
Pearson’s r varies between +1 and -1, where

r = 0, there is zero correlation between sales and expenses.

r = +1, there is perfecty positive correlation between sales and expenses.

r = -1, there is perfectly negative correlation between sales and expenses.

1− 𝑟 2
Probable error(P.E)= 0.675 *
√𝑛

If |r|<P.E(r); there is no evidence of correlation (the correlation coefficient is insignificant).

If |r|>6*P.E(r); there is evidence of correlation (the correlation coefficient is significant).

Let us consider the null and alternate hypothesis as follows:

Null hypothesis(H0): There is no significance difference between two variables.

Alt. hypothesis(H1): There is a significance difference between two variables.

Correlation:
sales expenses
sales Pearson
1 -.073
Correlation
Sig. (2-tailed) .907
N 5 5
expenses Pearson
-.073 1
Correlation
Sig. (2-tailed) .907
N 5 5

Interpretation:
Correlation = -.073;

There is low degree of negative correlation coefficient of sales and expenses.

10
Significance:

0.907>0.05

Since the p-value (0.907) is much greater than a conventional significance level like 0.05, we
typically fail to reject the null hypothesis. In other words, there is significance difference
between sales and expenses.
1− 𝑟 2
Probable error(P.E)= 0.675 *
√𝑛

P.E= 0.3

|r|=0.073

Here, |r|<P.E(r)

Since probable error of correlation coefficient is greater than absolute value of correlation
coefficient. Hence the correlation coefficient of sales and expenses is insignificant.

Question 4.

A computer operator is interested to know how data rate of internet users depend the band
width, the following result were gathered by the operator.

Band 17 35 41 19 25 20 10 15

Width

Data 47 64 68 50 60 55 30 33

rate

a) Is there any association between band width and data rate?

b) Fit the regression model to describe the given data and also interpret the estimated the
data rate when band width is 30 ?
c) What percentage of variation on the data rate is explained by the variation on band
width?

Solution:

Working Expressions:

Karl Pearson’s coefficient of correlation is an extensively used mathematical method in

which the numerical representation is applied to measure the level of relation between

11
linearly related variables The coefficient of correlation is expressed by “r”.

Pearson’s r varies between +1 and -1, where

r = 0, there is zero correlation between sales and expenses.

r = +1, there is perfecty positive correlation between sales and expenses.

r = -1, there is perfectly negative correlation between sales and expenses.

Let us consider the null and alternate hypothesis as follows:

Null hypothesis(H0): There is no significance difference between two variables.

Alt. hypothesis(H1): There is a significance difference between two variables.

Regression:

The regression equation Y on X is given by

Y= a+ bX where a is constant, b is regression coefficient, Y is dependent and X is

independent variable.

The regression equation X on Y is given by

X = a +bY where a is constant, b is regression coefficient, Y is independent and X is dependent

variable.

Correlations
band
width data rate
band width Pearson
1 .902**
Correlation
Sig. (2-tailed) .002
N 8 8
data rate Pearson
.902** 1
Correlation
Sig. (2-tailed) .002
N 8 8

12
**. Correlation is significant at the 0.01 level (2-tailed).
Interpretation:
Finding association between bandwidth and data rate:

Correlation = 0.0902;

There is high degree of positive correlation coefficient of band width and data rate.

Significance:

0.002<0.05

Since the p-value (0.002) is lesser than a conventional significance level like 0.05, there is not
sufficient evidence to reject the null hypothesis. In other words, there is no significance
difference between band width and data rate.

Model Summaryb

Adjusted R Std. Error of

Model R R Square Square the Estimate
a
1 .902 .814 .783 6.436
a. Predictors: (Constant), band width
b. Dependent Variable: data rate
Interpretation:

Coefficient of determination(𝑟 2 ) = 0.814

Hence only 81.4% of the variation of data rate has been explained by variation of band width

under given information.

13
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta T Sig.
1 (Constant) 23.749 5.761 4.123 .006
band width 1.192 .233 .902 5.126 .002
a. Dependent Variable: data rate
Interpretation:
The regression equation Y on X is given by

Y= a+ bX

where a is constant, b is regression coefficient, Y is dependent and X is independent variable.

Y= 23.749 + 1.192 X

The estimated regression equation Y on X is given is,

Y= 23.749 + 1.192 X

When X=30,

Y= 23.749 + 1.192 * 30

= 59.509

Composition of band width

14
Composition of data rate

Question 5

The yield of treatment in different plots is as shown in the following plots. Carry out analysis

t4 1401 T3 2536 T3 2459 T1 2537 T3 2827 T1 2069

t2 2211 T1 1797 T4 1170 T4 1516 T4 2104 T3 2385

t2 3366 T1 2104 T2 2591 T3 2460 T4 1077 T2 2544

Solution:

Working Expression:

The problem can be solved by using one way analysis of variance. Let’s consider the null and
alternative hypothesis:

H0 : µ1= µ2(There is no significance difference)

H1 : µ1≠ µ2(Not all means are same at least one mean is different)

Formula:

Grand Total(T)= ∑X1+ ∑X2+ ∑X3

Correction factor(CF)= T^2/N

Sum of square(TSS)= ∑X1^2 + ∑X2^2 –CF

Sum of square between sample (SSB)= [ (∑X1)^2/n + (∑X2)^2/n ]-CF

15
Error Sum of square(SSE) = TSS- SSB

ANOVA
values
Sum of
Squares df Mean Square F Sig.
Between
4265689.961 3 1421896.654 11.253 .001
Groups
Within Groups 1768941.150 14 126352.939
Total 6034631.111 17

Interpretation:

Confidence interval (1- α %)= 95%

Level of significance(α%) = 5% =0.05

P-value= 0.001

So, P-value< α%

Since P-value is lesser than the level of significance , we typically fail to reject the null
hypothesis. In other words, there is significance difference between treatments.

Question 6

The following table gives of the result of the experiment on four varieties of a crop in 5 block
of polt.

Treatment Block1 Block2 Block3 Block4 Block5

A 32 34 33 35 37

B 34 33 36 37 35

C 31 34 35 32 36

D 29 26 30 28 29

Solution:

Working Expression:

The two-way ANOVA compares the mean differences between groups that have been split on
two independent variables (called factors). The primary purpose of a two-way ANOVA is to

16
understand if there is an interaction between the two independent variables on the dependent
variable.

Let’s consider the null and alternative hypothesis for row:

H0 : µ1= µ2 =µ3= µ4 (There is no significance difference)

H1 : µ1≠ µ2 =µ3≠ µ4 (Not all means are same at least one mean is different)

Let’s consider the null and alternative hypothesis for column:

H0 : µ1= µ2 =µ3= µ4 = 5(There is no significance difference)

H1 : µ1≠ µ2 =µ3≠ µ4 ≠ µ5(Not all means are same at least one mean is different)

Formula:

Grand Total(T)= ∑C1+ ∑C2+ ∑C3+ ∑C4+ ∑C5

Correction factor(CF)= T^2/N

Sum of square(TSS)= ∑C1^2 + ∑C2^2+ ∑C3^2+ ∑C4^2 +∑C5^2 –CF

Sum of square due to row(SSR)= [ (∑R1)^2/n + (∑R2)^2/n+ ∑R3)^2/n + (∑R4)^2/n]-CF

Sum of square due to column(SSC)= [ (∑C1)^2/n + (∑C2)^2/n+ ∑C3)^2/n + (∑C4)^2/n+

(∑C5)^2/n ]-CF

Error Sum of square(SSE) = TSS- SSR- SSC

Tests of Between-Subjects Effects

Dependent Variable: value
Type III Sum
Source of Squares df Mean Square F Sig.
Corrected
155.700a 7 22.243 9.048 .001
Model
Intercept 21516.800 1 21516.800 8752.597 .000
Treatments 134.000 3 44.667 18.169 .000
Block 21.700 4 5.425 2.207 .130
Error 29.500 12 2.458
Total 21702.000 20
Corrected Total 185.200 19
a. R Squared = .841 (Adjusted R Squared = .748)

17
Interpretation:

For treatment:

Confidence interval (1- α %)= 95%

Level of significance(α%) = 5% =0.05

P-value= 0.00

So, P-value< α%

Since P-value is lesser than the level of significance , we typically fail to reject the null
hypothesis. In other words, there is significance difference between treatments.

For Block:

Confidence interval (1- α %)= 95%

Level of significance(α%) = 5% =0.05

P-value= 0.00

So, P-value< α%

Since P-value is lesser than the level of significance , we typically fail to reject the null
hypothesis. In other words, there is significance difference between blocks.

Question 7

There are three brand of computer namely A , B, C and their life time is tabulated as follows.
Test whether the average lifetime of three brands of computer is significantly different at 5%
level of significant.

A 5 7 3 2 6 4 8 9

B 2 3 4 5 6

C 7 8 9 10 12 13 11

Solution:

Working Expression:

The problem can be solved by using one way analysis of variance. Let’s consider the null and
alternative hypothesis:

H0 : µ1= µ2(There is no significance difference)

H1 : µ1≠ µ2(Not all means are same at least one mean is different)

18
Formula:

Grand Total(T)= ∑X1+ ∑X2+ ∑X3

Correction factor(CF)= T^2/N

Sum of square(TSS)= ∑X1^2 + ∑X2^2 –CF

Sum of square between sample (SSB)= [ (∑X1)^2/n + (∑X2)^2/n ]-CF

Error Sum of square(SSE) = TSS- SSB

ANOVA
values of brand
Sum of
Squares df Mean Square F Sig.
Between
124.200 2 62.100 13.196 .000
Groups
Within Groups 80.000 17 4.706
Total 204.200 19

Interpretation:

Confidence interval (1- α %)= 95%

Level of significance(α%) = 5% =0.05

P-value= 0.00

So, P-value< α%

Since P-value is lesser than the level of significance , we typically fail to reject the null
hypothesis. In other words, there is significance difference between brands.

Question 8

Roll number name age weight gender Education

level

1 Ram 25 65 1 2

2 Shyam 22 55 1 3

3 Sita 23 43 2 3

4 Hari 21 45 1 1

19
5 Rita 24 40 2 2

6 Roshan 21 43 1 3

7 Sabita 21 48 2 3

8 Diya 25 52 2 2

9 depak 19 41 1 1

Solution:

Working expression:

 Mean:

Formula: Mean (μ) = (Sum of all values) / (Number of values)

 Median:

Formula: Median = Middle value (for an odd number of values) or (Sum of two middle values)
/ 2 (for an even number of values)

 Mode:

Formula: Mode = Value(s) that occur(s) most frequently in the dataset

 Maximum:

Formula: Maximum = Largest value in the dataset

 Minimum:

Formula: Minimum = Smallest value in the dataset

 Standard Deviation:

Formula: σ (sigma) = √(Σ(xi - μ)² / N)

 Percentile:

Formula: Percentile Rank = (Number of values below the data point) / (Total number of values)
* 10

20
Statistics
weight of the age of the
student student
N Valid 9 9
Missing 1 1
Mean 48.00 22.33
Std. Error of Mean 2.703 .687
Median 45.00 22.00
Mode 43 21
Std. Deviation 8.109 2.062
Variance 65.750 4.250
Skewness 1.262 -.024
Std. Error of Skewness .717 .717
Kurtosis 1.253 -.983
Std. Error of Kurtosis 1.400 1.400
Range 25 6
Minimum 40 19
Maximum 65 25
Sum 432 201
Percentiles .35 40.00 19.00
.65 40.00 19.00
25 42.00 21.00
50 45.00 22.00
75 53.50 24.50

weight of the student

Cumulative
Frequency Percent Valid Percent Percent
Valid 40 1 10.0 11.1 11.1
41 1 10.0 11.1 22.2
43 2 20.0 22.2 44.4
45 1 10.0 11.1 55.6
48 1 10.0 11.1 66.7
52 1 10.0 11.1 77.8
55 1 10.0 11.1 88.9
65 1 10.0 11.1 100.0
Total 9 90.0 100.0
Missing System 1 10.0
Total 10 100.0

21
age of the student
Frequenc Valid Cumulative
y Percent Percent Percent
Valid 19 1 10.0 11.1 11.1
21 3 30.0 33.3 44.4
22 1 10.0 11.1 55.6
23 1 10.0 11.1 66.7
24 1 10.0 11.1 77.8
25 2 20.0 22.2 100.0
Total 9 90.0 100.0
Missing System 1 10.0
Total 10 100.0

Composition of weight and age

22
The below shown image is the pie chart which shows the different education level:

23
Question 9

Carry out the analysis of variance of the following data

Type C1 C2 C3

83 56 79

83 76 95

76 72 87

Test whether the average cost per computer is significantly different among three type of the
computer at 5% level of significance.

Solution:

Working Expression:

The problem can be solved by using one way analysis of variance. Let’s consider the null and
alternative hypothesis:

H0 : µ1= µ2(There is no significance difference)

H1 : µ1≠ µ2(Not all means are same at least one mean is different)

Formula:

Grand Total(T)= ∑X1+ ∑X2+ ∑X3

Correction factor(CF)= T^2/N

Sum of square(TSS)= ∑X1^2 + ∑X2^2 –CF

Sum of square between sample (SSB)= [ (∑X1)^2/n + (∑X2)^2/n ]-CF

Error Sum of square(SSE) = TSS- SSB

ANOVA
values
Sum of
Squares df Mean Square F Sig.
Between
561.556 2 280.778 4.380 .067
Groups
Within Groups 384.667 6 64.111
Total 946.222 8

Interpretation:

Confidence interval (1- α %)= 95%

24
Level of significance(α%) = 5% =0.05

P-value= 0.067

So, P-value>α%

Since P-value is greater than the level of significance , this indicates that there is not significant
evidence to reject the null hypothesis. This suggests that there is significant difference between
cost per computer.

Question 10

Water temperature Detergent A B C

Cold 45 43 55

Warm 37 40 56

Hot 42 44 46

The following table gives the data on the perform of three different detergents at three different
water temperature. The performance was obtained on the whiteness reading based on specially
designed equipments for nine loads of washing.

Perform a two way analysis of variance using the level of significance at0.05

Solution:

Working Expression:

The two-way ANOVA compares the mean differences between groups that have been split on
two independent variables (called factors). The primary purpose of a two-way ANOVA is to
understand if there is an interaction between the two independent variables on the dependent
variable.

Let’s consider the null and alternative hypothesis for row:

H0 : µ1= µ2 =µ3= µ4 (There is no significance difference)

H1 : µ1≠ µ2 =µ3≠ µ4 (Not all means are same at least one mean is different)

Let’s consider the null and alternative hypothesis for column:

H0 : µ1= µ2 =µ3= µ4 = 5(There is no significance difference)

H1 : µ1≠ µ2 =µ3≠ µ4 ≠ µ5(Not all means are same at least one mean is different)

Formula:
25
Grand Total(T)= ∑C1+ ∑C2+ ∑C3+ ∑C4+ ∑C5

Correction factor(CF)= T^2/N

Sum of square(TSS)= ∑C1^2 + ∑C2^2+ ∑C3^2+ ∑C4^2 +∑C5^2 –CF

Sum of square due to row(SSR)= [ (∑R1)^2/n + (∑R2)^2/n+ ∑R3)^2/n + (∑R4)^2/n]-CF

Sum of square due to column(SSC)= [ (∑C1)^2/n + (∑C2)^2/n+ ∑C3)^2/n + (∑C4)^2/n+

(∑C5)^2/n ]-CF

Error Sum of square(SSE) = TSS- SSR- SSC

Tests of Between-Subjects Effects

Dependent Variable: values
Type III Sum
Source of Squares df Mean Square F Sig.
a
Corrected Model 246.667 4 61.667 3.190 .144
Intercept 18496.000 1 18496.000 956.690 .000
WaterTemperatur
24.667 2 12.333 .638 .575
e
Detergent 222.000 2 111.000 5.741 .067
Error 77.333 4 19.333
Total 18820.000 9
Corrected Total 324.000 8
a. R Squared = .761 (Adjusted R Squared = .523)

Interpretation:

For WaterTemperature:

Confidence interval (1- α %)= 95%

Level of significance(α%) = 5% =0.05

P-value= 0.575

So, P-value< α%

For Detergent:

Confidence interval (1- α %)= 95%

Level of significance(α%) = 5% =0.05

26
P-value= 0.067

So, P-value< α%

Question 11

sn age gender bmi bp_sy bp_dy occu lit smoke alco

1 92 1 24 120 80 2 2 3 2

2 65 1 20 90 80 1 2 2 2

3 85 2 17 140 95 1 1 1 2

4 65 2 25 140 100 1 1 1 2

5 64 2 26 150 90 1 1 1 2

6 85 1 17 120 70 1 1 2 2

7 76 1 21 110 80 1 1 2 2

8 65 2 22 110 75 1 1 3 2

9 78 1 23 90 60 1 1 2 2

10 60 2 20 130 80 2 2 1 2

11 66 1 28 120 60 1 2 3 2

12 65 2 30 130 80 1 2 1 2

13 69 2 25 120 80 1 2 2 2

14 61 2 24 140 90 1 2 1 1

15 67 2 26 120 80 1 2 1 2

16 68 2 20 120 80 1 2 1 1

17 80 1 25 120 80 1 2 2 2

18 65 1 27 110 70 1 2 3 2

19 65 2 29 130 80 1 2 1 2

20 70 2 30 100 60 1 2 1 2

27
Calculate the descriptive statistics and make pie-chart and box-plots of the following data.

Solution:

Working Expression:

Descriptive statistic is a summary statistic that quantitatively describes or summarizes features

from a collection of information, while descriptive statistics is the process of using and
analysing those statistic.

Statistics

body mass systolic blood diastollic blood

age index pressure pressure

N Valid 20 20 20 20

Missing 0 0 0 0
Mean 70.55 23.95 120.50 78.50
Median 66.50 24.50 120.00 80.00
Mode 65 20a 120 80
Std. Deviation 8.959 3.927 16.051 10.773
Variance 80.261 15.418 257.632 116.053
Skewness 1.123 -.196 -.259 -.129
Std. Error of Skewness .512 .512 .512 .512
Kurtosis .270 -.740 -.101 .134
Std. Error of Kurtosis .992 .992 .992 .992
Minimum 60 17 90 60
Maximum 92 30 150 100

a. Multiple modes exist. The smallest value is shown

The below shown image is the pie chart:

28
The below shown image is the boxplots of bmi under habit of using alcohol:

The below shown image is the boxplots of bmi under gender:

29
The below shown image is the boxplots of blood pressure:

Statistics
category

N Valid 20

Missing 0
Mean 1.60
Std. Error of Mean .210
Median 1.00
Mode 1
Std. Deviation .940
Variance .884
Skewness 1.367
Std. Error of Skewness .512
Kurtosis .754
Std. Error of Kurtosis .992
Range 3
Minimum 1
Maximum 4
Sum 32
Percentiles .45 1.00

.95 1.00

25 1.00

50 1.00
75 2.00

30
category

Cumulative
Frequency Percent Valid Percent Percent

Valid 60-69 13 65.0 65.0 65.0

70-79 3 15.0 15.0 80.0

80-89 3 15.0 15.0 95.0

90-99 1 5.0 5.0 100.0

Total 20 100.0 100.0

@regression
No ratings yet
@regression
33 pages
English ST 3001 Exam V 2012
No ratings yet
English ST 3001 Exam V 2012
7 pages
Business Forecasting 9th Edition Hanke Solution Manual
71% (7)
Business Forecasting 9th Edition Hanke Solution Manual
9 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Midterm Answer
No ratings yet
Midterm Answer
5 pages
Ba9201 - Statistics For Managementjanuary 2010
100% (2)
Ba9201 - Statistics For Managementjanuary 2010
5 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
Correlation Regression
100% (1)
Correlation Regression
7 pages
Tutorial Chapter 3 & 4
No ratings yet
Tutorial Chapter 3 & 4
11 pages
Community Project: Simple Linear Regression in SPSS
No ratings yet
Community Project: Simple Linear Regression in SPSS
4 pages
Lecture 8 Correlation and Linear Regression
No ratings yet
Lecture 8 Correlation and Linear Regression
66 pages
MAS 132 - Statistics II
No ratings yet
MAS 132 - Statistics II
6 pages
Lecture 4 Linear Regression
No ratings yet
Lecture 4 Linear Regression
75 pages
STAT 302-1 Sample Final Exam
No ratings yet
STAT 302-1 Sample Final Exam
26 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Sample Solution
No ratings yet
Sample Solution
4 pages
Practical
No ratings yet
Practical
14 pages
Intermediate Statistics Test Sample 2
0% (1)
Intermediate Statistics Test Sample 2
19 pages
Week 8 - 10
No ratings yet
Week 8 - 10
72 pages
Statistics for College Students
100% (1)
Statistics for College Students
19 pages
Review 2 K49
No ratings yet
Review 2 K49
4 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
21 pages
22 23 24 25 Math CS Question
No ratings yet
22 23 24 25 Math CS Question
22 pages
Final Exam of Statistics June 2021
No ratings yet
Final Exam of Statistics June 2021
5 pages
RM Practical-195218222
No ratings yet
RM Practical-195218222
15 pages
SPSS Def + Example - New - 1!1!2011
No ratings yet
SPSS Def + Example - New - 1!1!2011
43 pages
SBBB
No ratings yet
SBBB
5 pages
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
No ratings yet
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
45 pages
Lesson 12 - Introduction To Regression and Correlation Analysis Regression Analysis
No ratings yet
Lesson 12 - Introduction To Regression and Correlation Analysis Regression Analysis
39 pages
Hypothesis Testing for Dispensers
No ratings yet
Hypothesis Testing for Dispensers
8 pages
8-Simple Regression Analysis
No ratings yet
8-Simple Regression Analysis
9 pages
Deepu Final
No ratings yet
Deepu Final
9 pages
BCA SPSS Praticle Question
No ratings yet
BCA SPSS Praticle Question
3 pages
Econo Labs
No ratings yet
Econo Labs
27 pages
Continuous Data Analysis HW3: QI 25.09 0.079 Age
No ratings yet
Continuous Data Analysis HW3: QI 25.09 0.079 Age
3 pages
Basic Statistical Concepts Review
100% (6)
Basic Statistical Concepts Review
227 pages
A Review of Basic Statistical Concepts: Answers To Problems and Cases 1
No ratings yet
A Review of Basic Statistical Concepts: Answers To Problems and Cases 1
227 pages
Jam Session Stat140 November 2024
No ratings yet
Jam Session Stat140 November 2024
6 pages
WILP ASM End-Sem (Regular) Solutions
No ratings yet
WILP ASM End-Sem (Regular) Solutions
3 pages
Advanced Biostatics II Individual Assignment
No ratings yet
Advanced Biostatics II Individual Assignment
32 pages
Probability & Statistics Exam
No ratings yet
Probability & Statistics Exam
10 pages
SRT 605 - Topic (10) SLR
No ratings yet
SRT 605 - Topic (10) SLR
39 pages
Final Exam - Sample Test
No ratings yet
Final Exam - Sample Test
6 pages
Test of Statistical Hypothesis
No ratings yet
Test of Statistical Hypothesis
100 pages
Stats101A - Chapter 1
No ratings yet
Stats101A - Chapter 1
25 pages
Regression and Correlation
No ratings yet
Regression and Correlation
17 pages
Intermediate Statistics Sample Test 1
0% (3)
Intermediate Statistics Sample Test 1
17 pages
Inference For Regression
No ratings yet
Inference For Regression
24 pages
Chapter 6 Student
No ratings yet
Chapter 6 Student
21 pages
Analytical Notes-2
No ratings yet
Analytical Notes-2
22 pages
Lecture 10 - Revision
No ratings yet
Lecture 10 - Revision
28 pages
Tutorial 10 - Questions
No ratings yet
Tutorial 10 - Questions
3 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Parametric Test
No ratings yet
Parametric Test
49 pages
Chap 1,2,3,5,6 (QA) Upload
No ratings yet
Chap 1,2,3,5,6 (QA) Upload
6 pages
Report
No ratings yet
Report
30 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
DADM - Tools Help
No ratings yet
DADM - Tools Help
25 pages
Continuous Improvement Toolkit: Descriptive Statistics
No ratings yet
Continuous Improvement Toolkit: Descriptive Statistics
41 pages
AB Testing in ML
No ratings yet
AB Testing in ML
2 pages
Normal Distribution (Convert Normal Random Variable To Standard Normal Variable)
67% (3)
Normal Distribution (Convert Normal Random Variable To Standard Normal Variable)
13 pages
Learn 7 R Visualizations Guide
No ratings yet
Learn 7 R Visualizations Guide
15 pages
Lec-1 Bias-variance-Tradeoff
No ratings yet
Lec-1 Bias-variance-Tradeoff
24 pages
Apayao State College Statistics Exam
No ratings yet
Apayao State College Statistics Exam
11 pages
Normal Distribution
No ratings yet
Normal Distribution
9 pages
EViews Time Series Estimation Guide
No ratings yet
EViews Time Series Estimation Guide
20 pages
Donna - MAS - Full 1-14 (Repaired)
No ratings yet
Donna - MAS - Full 1-14 (Repaired)
82 pages
Measures of Central Tendency and Disperson
No ratings yet
Measures of Central Tendency and Disperson
21 pages
Bigmart Sales Using Machine Learning With Data Analysis
No ratings yet
Bigmart Sales Using Machine Learning With Data Analysis
5 pages
Stats Lab 2
No ratings yet
Stats Lab 2
7 pages
MODULE 3 - Data Management
0% (1)
MODULE 3 - Data Management
24 pages
The Role of Supplier Base Rationalisation in Operational Performance in The Retail Sector in Zimbabwe
No ratings yet
The Role of Supplier Base Rationalisation in Operational Performance in The Retail Sector in Zimbabwe
10 pages
Preface
No ratings yet
Preface
4 pages
EPICS Sample Report
No ratings yet
EPICS Sample Report
59 pages
DBB2102 Unit-05
No ratings yet
DBB2102 Unit-05
22 pages
RECORD NOTE BOOK - Front Pages
No ratings yet
RECORD NOTE BOOK - Front Pages
4 pages
Task 02: Example of Analysing Data and Residual Volatility and Estimating ARCH and GARCH Models
No ratings yet
Task 02: Example of Analysing Data and Residual Volatility and Estimating ARCH and GARCH Models
12 pages
Module 3 - Statistical Inference-1
No ratings yet
Module 3 - Statistical Inference-1
19 pages
Syllabus FML
No ratings yet
Syllabus FML
3 pages
Demand Forecasting for Engineers
No ratings yet
Demand Forecasting for Engineers
3 pages
Grouped Data Histogram
No ratings yet
Grouped Data Histogram
16 pages
CENTRAL TENDENCY Topical Past Papers
No ratings yet
CENTRAL TENDENCY Topical Past Papers
35 pages
Business Statistics in Practices Chap - 08
No ratings yet
Business Statistics in Practices Chap - 08
30 pages
Library Service Quality Analysis
No ratings yet
Library Service Quality Analysis
2 pages
Assignment Solutions 8
No ratings yet
Assignment Solutions 8
3 pages
Credit Score Fraud Detection with XGBoost
No ratings yet
Credit Score Fraud Detection with XGBoost
2 pages
Unit V - R Programming Notes
No ratings yet
Unit V - R Programming Notes
12 pages

Statistics Lab

Uploaded by

Statistics Lab

Uploaded by

Table of Contents

S.N Title Date Signaturre

1. Calculate the descriptive statistics of the 2081-12-25

2. Determine the least square equation that 2081-12-25

3. Calculate karl pearson's correlation 2081-12-25

4. Determine association between band width 2081-12-25

Calculation of one way Anova to the given 2081-12-25

6. Calculation of two way Anova to the given 2081-12-25

7. There are three brand of computer namely A 2081-12-25

9. Perform a two way analysis of variance 2081-12-25

10. The following table gives the data on the 2081-12-25

11. Calculate the descriptive statistics and make 2081-12-25

Also calculate the descriptive statistics of the data.

b) Calculate the standard error

d) Conduct the residual analysis

e) Estimate the dependent variable when independent variable is 35

The regression equation Y on X is given by

Y= a+ bX where a is constant, b is regression coefficient, Y is dependent and X is

The regression equation X on Y is given by

X = a +bY where a is constant, b is regression coefficient, Y is independent and X is dependent

Model B Std. Error Beta t Sig.

1 (Constant) -4.192 1.888 -2.220 .077

initial weight(x1) .105 .032 .501 3.247 .023

initial age(x2) .807 .158 .786 5.097 .004

a. Dependent Variable: weight gain(y)

The least square equation that best describe the variable:

When X1 variable is constant, Y variable is dependent on X2 variable and while X2 variable

Also the regression equation weight gain on initial age is:

The least-squares line always passes through the point ( , ).

Adjusted R Std. Error of the

1 .939a .881 .834 .999

Model Sum of Squares df Mean Square F Sig.

1 Regression 37.009 2 18.505 18.539 .005b

Residual 4.991 5 .998

a. Dependent Variable: weight gain(y)

Let us consider the null and alternate hypothesis as follows:

Null hypothesis(H0): There is no significance difference between two variables.

Alt. hypothesis(H1): There is a significance difference between two variables.

Minimum Maximum Mean Std. Deviation N

Predicted Value 4.07 10.31 6.50 2.299 8

Model Sum of Squares df Mean Square F Sig.

1 Regression 37.009 2 18.505 18.539 .005b

Residual 4.991 5 .998

a. Dependent Variable: weight gain(y)

Composition of weight gain

Estimating the dependent variable when independent variable is 35,

The regression coefficients of Y on X is given by,

When X1=35, X2=35,

The regression coefficients of Y on X when independent variable is 35,

Calculate karl pearson's correlation coefficient from the following data.

Karl Pearson’s coefficient of correlation is an extensively used mathematical method in which

r = 0, there is zero correlation between sales and expenses.

r = +1, there is perfecty positive correlation between sales and expenses.

r = -1, there is perfectly negative correlation between sales and expenses.

If |r|<P.E(r); there is no evidence of correlation (the correlation coefficient is insignificant).

If |r|>6*P.E(r); there is evidence of correlation (the correlation coefficient is significant).

Let us consider the null and alternate hypothesis as follows:

Null hypothesis(H0): There is no significance difference between two variables.

Alt. hypothesis(H1): There is a significance difference between two variables.

There is low degree of negative correlation coefficient of sales and expenses.

a) Is there any association between band width and data rate?

Karl Pearson’s coefficient of correlation is an extensively used mathematical method in

Pearson’s r varies between +1 and -1, where

r = 0, there is zero correlation between sales and expenses.

r = +1, there is perfecty positive correlation between sales and expenses.

r = -1, there is perfectly negative correlation between sales and expenses.

Let us consider the null and alternate hypothesis as follows:

Null hypothesis(H0): There is no significance difference between two variables.

Alt. hypothesis(H1): There is a significance difference between two variables.

The regression equation Y on X is given by

Y= a+ bX where a is constant, b is regression coefficient, Y is dependent and X is

The regression equation X on Y is given by

X = a +bY where a is constant, b is regression coefficient, Y is independent and X is dependent

Adjusted R Std. Error of

Coefficient of determination(𝑟 2 ) = 0.814

under given information.