Final
Final
Q.1 Attempt any Three (03) questions in the following (Not more than 10 lines). [3 x 3 =09 marks]
[30 minutes]
a) Suppose a social researcher is drawn four point estimators x̅1 = 235, x̅2 = 230, x̅3 = 232, and x̅4 = 231 from
same unknown population, while population mean μ is 240 of same unknown population. How you can suggest to social
researcher to identify good point estimator among four given point estimator. Also justify your recommended good point estimator.
Required
How you can help social researcher to identify good point estimator among four given point estimator. Also justify your
recommended good point estimator.
Solution:
Margin of Error or Sampling Error = e1 = x̅1 − μ = 235 – 240 = – 5
Margin of Error or Sampling Error = e2 = x̅2 − μ = 230 – 240 = – 10
Margin of Error or Sampling Error = e3 = x̅3 − μ = 232 – 240 = – 8
Margin of Error or Sampling Error = e4 = x̅4 − μ = 231 – 240 = – 9
The above computed margin errors in indicates that Margin of Error or Sampling Error = e1 = – 5 is minimum magnitude
between x̅1 − μ therefore we can conclude that x̅1=235 is good and efficient pion estimator.
b) Given data of four finite population A, B, C and D in the following table:
Population A B C D
Population size NA = 2,500 NB = 3,000 NC = 2,000 4,000
If n = 50 samples to be collected from entire given population A, B, C, and D collectively. Use Population Allocation Proportion
Stratified Random Sampling Method to determine nA = ?, nB = ?, nC = ?
Required:
Use Population Allocation Proportion Stratified Random Sampling Method to determine nA = ?, nB = ?, nC = ?, nD =? respectively.
c) Discuss the effects of sampling error (margin of error) on interval estimate at several confidence interval with suitable real life
example.
Solution:
Interval estimate of population parameters is the width of mathematical statements, where population parameter lie under the
given confidence level of confidence β. It has observed that the width of interval estimate of population parameters is directly
proportion with confidence level β. In short confidence level will be increased respectively the width of interval of population
parameters will be increased respectively and confidence level β will be decreased then width of interval of population parameters
will be decreased automatically. For illustrate an example of three cases of confidence level β and observe impact of width of
interval of population parameters with same good point estimator.
Case No. 1: β1 = 90%
90% Confidence Interval of population mean = Prob.[340 hr ≤ μ ≤ 655 hrs]
Case No. 2: β1 = 85%
85% Confidence Interval of population mean = Prob.[345 hr ≤ μ ≤ 644 hrs]
Case No. 3: β1 = 95%
95% Confidence Interval of population mean = Prob.[330 hr ≤ μ ≤ 665 hrs]
In the above examples Case No. 3 is explained that at 95% confidence level the width of population mean μ larger than others two
cases 1 and 2.
d) Explain the significant relationship between required sample size (n) and sampling error (margin of error). Also discuss effect on
standard error of inferential statistics.
Solution:
Margin of Error(e) is square time inversely proportion of sample size(n) is other factors are taking constant.
We know that
Z-Statistic is the ratio of margin of error and standard error, mathematically can be written as
𝑥̅ − 𝜇 𝜎
Z-Statistic= 𝜎 𝑥̅ where 𝜎𝑥̅ = 𝑛, if population is large and unknown and 𝑒 = 𝑥̅ − 𝜇𝑥̅
̅
𝑥 √
𝑒
±𝑍𝛼⁄2 = 𝜎
̅
𝑥
𝜎 𝑒
=𝑍
√𝑛 𝛼⁄
2
Required
How large a samples (n1 and n2) is needed, if one wishes to be at least 90% and 96% confident that our estimate is within 1.5% of
the true percentage?
Solution:
For 90% Confidence Interval
Marginal error (sampling error) = 𝑝̅ – 𝑝 e = 1.5% =0.015
Sample proportion 𝑝̂ = 0.5
𝑞̂ = 0.5
Level of significant 𝛼 = 10%
𝛼
= 5% = 0.05
2
Confidence Level 𝛽 = 90%
𝑍0.05 = ±1.64485
Now required sample size
𝒁𝜶⁄ .±1.64485
𝒏=[ 𝟐
]𝟐 (𝑝̂ 𝑞̂) ⟹ 𝑛1 = [ 0.015 ]2 . (0.5)(0.5)
𝒆
⟹ 𝑛1 = 3,006 samples
For 96% Confidence Interval
Marginal error (sampling error) = 𝑝̅ – 𝑝 e = 1.5% =0.015
Sample proportion 𝑝̂ = 0.5
𝑞̂ = 0.5
Level of significant 𝛼 = 4%
𝛼
= 2% = 0.02
2
Confidence Level 𝛽 = 96%
𝑍0.02 = ±2.05375
Now required sample size
𝒁𝜶⁄ .±2.05375
𝒏=[ 𝟐
]𝟐 (𝑝̂ 𝑞̂) ⟹ 𝑛2 = [ 0.015 ]2 . (0.5)(0.5)
𝒆
⟹ 𝑛2 = 4,687 samples
Q.2b Write significance difference between Simple Random Sampling Procedures and Systematic Random Sampling Procedure also write
advantages: [02 marks]
[10 Minutes]
Solution:
Simple Random Sampling Procedures Systematic Random Sampling Procedure
In simple random sampling, each data point has an Meanwhile, systematic sampling chooses a data point per
equal probability of being chosen. each predetermined interval. While systematic sampling
is easier to execute than simple random sampling, it can
produce skewed results if the data set exhibits patterns.
In simple random sampling, each data point has an While systematic sampling is easier to execute than
equal probability of being chosen. Meanwhile, simple random sampling, it can produce skewed results
systematic sampling chooses a data point per each if the data set exhibits patterns. It is also more easily
predetermined interval. manipulated.
On the contrary, simple random sampling is best used
for smaller data sets and can produce more
representative results.
Q.3 Consider sample height (in cm) of men from two different populations A and B with same demographic: [09 marks]
[20 Minutes]
Population – A Population – B
54 64 56 51 45 65 73 45 56 72 90 55 49 59
65 72 88 56 59 93 73 56 78 78 68 80 61 65
45 54 72 48 45 67 46 57 72 56 59 50 84 69
43 69 60 81 65 54 48 77 66 91 86
73 58 63 86
Required:
a) Construct interval estimate of difference between population means (𝜇1− 𝜇2) at 94% confidence Interval.
b) Construct the testing of hypothesis of difference between two population A and B means at 3% level of significance. Compute
p-value and type of statistical Error.
Solution:
a) 𝑥̅𝐴 = 61.76 = 𝑥̅2
𝑥̅𝐵 = 66.44 = 𝑥̅1
𝜎𝐴 =13.49 = 𝜎2
(𝑥̅1 − 𝑥̅2 ) = 4.68
𝜎𝐵 = 12.76 = 𝜎1
𝑛𝐴 = 25 = 𝑛2
𝑛𝐵 = 32 = 𝑛1
𝛽 = 94%
𝛼 = 6% = 0.06
𝜎21 𝜎22
𝜎𝑥̅1 −𝑥̅2 = √ + = √7.28 + 5.09 = 3.52
𝑛1 𝑛2
Hence
94 % Confidence Interval for Difference 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 (𝜇1 − 𝜇2 ) = 𝑃𝑟𝑜𝑏. [(𝑥̅1 − 𝑥̅2 ) ± 𝑍𝛼2 𝜎(𝑥̅1−𝑥̅2) ]
= Prob.[(4.68) ± (1.88097) (3.52)]
= Prob.[ 4.68 ± 6.62]
b) Testing of Hypothesis
Step-9: P-value (Actual Rejection Region) p-value can be obtained at 𝑍𝐶𝑎𝑙 = −1.3301
Probability at -1.3301 is 0.091646, there actual p-value will be for non-
direction test, p-value = 0.183291 = 18.3291%
Step-11: Types of Statistical Error Type-II statistical Error occur because our calculated value fall in rejection
region, because
𝑍𝑇𝑎𝑏 > 𝑍𝐶𝑎𝑙 = 2.17008> 1.3301
Step-12: Post Test Normal Curve
Step-13: Decision: We can confidently conclude that we can accept the null hypothesis
at given level of significance (3%), but we strongly reject the null hypothesis
at p-value i.e.183291%. Therefore, Type-II statistical error occurred.
Page 5 of 10 Statistical Inference (QTM-204) BS-4A (A & F) Spring - 2023
Q.4a What is the purpose of use Analysis of Variance (ANOVA) in statistical data analysis? [02 marks]
[10 minutes]
Solution:
Analysis of Variance (ANOVA) is a statistical tool for testing of hypothesis. The Analysis of Variance can be used for more than two
unknown population means. Z-test is restricted to test maximum two population mean, but ANOVA help us to perform hypothesis
test for more than two population means. Analysis of Variance (ANOVA) is a statistical method to be used testing of hypothesis for
more than two population means. ANOVA is known F-ratio test of two variances. Analysis of Variance can be used for numbers of
population to identify about significance difference among more than two means. The type of ANOVA test used depends on a
number of factors. It is applied when data needs to be experimental. Analysis of variance is employed if there is no access to
statistical software resulting in computing ANOVA by hand. It is simple to use and best suited for small samples. With many
experimental designs, the sample sizes have to be the same for the various factor level combinations.
Q.4b The following data represent the final grades obtained by 5 students in Statistics, Economics, Chemistry, and Physics:
[08 marks]
[20 minutes]
Table
Subject
Students Statistics Economics Chemistry Physics
A 75 65 73 80
B 45 67 Absent 56
C 67 Absent 62 67
D 67 54 83 Absent
E Absent 73 54 69
Required:
Use a 0.05 level of significance to test the hypothesis that
i) The courses are of equal difficulty;
ii) The students have equal ability.
Solution:
The following steps of hypothesis should conclude required hypothesis under given information.
Step-11: P-value (Actual Rejection Region): 1) p-value can be obtained at 𝑓𝐶𝑎𝑙𝑅 = 1.684
from f-score table that we can observe that the probability at
1.684 is approximate at same d.f (4, 12) = 0.217884 or 21.7884%
Hence actual probability of rejection region is 21.7884% instead or 5%
2) ) p-value can be obtained at 𝑓𝐶𝑎𝑙𝐶 = 0.270351 from f-score
table, we can observe that the probability at 𝑓𝐶𝑎𝑙𝐶 = 0.27035 is
approximate at same d.f (2, 6) = 0.845575 or 84.5575%. Hence actual
probability of acceptance region 84.5575% instead or 5%
Step-12: Power of Statistical Test p-value indicate that power of statistical test will
be expected for certain unknown population may be 𝛽𝑅 = 1– (p-value)
= 1 – 0.217884 = 0.7821163 or 78.21163%.
Step-13: Decisions: a) Accept the 𝐻 / 𝑜 and conclude that there is no difference in subjects
Interest of all students at 5% level of significance and reject the null
at computed p-value.
b) Accept 𝐻 // 𝑜 and conclude that there is no difference in students
performance in of all offered courses at 5% level of significance and
reject the null at computed p-value.
Q.5 Following provided data set of 60 data of gender and Grades of business students: [08 marks]
[20 minutes]
Gender Grades Gender Grades Gender Grades Gender Grades
Female A Female B Female A Female B
Female C Male A Female C Male A
Male D Male A Male D Male C
Male B Male B Female A Female B
Female C Female D Male A Female B
Male D Female C Male B Female C
Female D Male D Female C Male B
Male C Male C Male D Male A
Male C Female D Female D Male C
Female C Male D Female C Female B
Male D Female D Male B Female D
Female A Male B Female D Male D
Male C Male A Female C Female C
Female B Female B Male A Male D
Female A Female A Male A Female D
Required:
a) Construct a contingency (Cross table) and fill-in.
A B C D Total
Female
Male
Total
b) Fill in above table of observed frequencies for gender and grade.
c) Calculate the sample Chi-Square value.
d) State the null and alternative hypothesis.
e) Perform all steps of testing of hypothesis
Solution:
The following steps of hypothesis should conclude required hypothesis under given information.
Step-1: Null Hypothesis: H0 : 𝑝𝐴 = 𝑝𝐵 = 𝑝𝐶 = 𝑝𝐷
Male 8 6 6 9 29
Total 14 13 16 17 60
(31)(14) (29)(14)
𝑒1 = = 7.233 𝑒5 = = 6.767
60 60
(31)(13) (29)(13)
𝑒2 = = 6.717 𝑒6 = = 6.283
60 60
(31)(16) (29)(16)
𝑒3 = = 8.267 𝑒7 = = 7.733
60 60
(31)(17) (29)(17)
𝑒4 = = 8.789 𝑒8 = = 8.217
60 60
𝑜𝑖 𝑒𝑖 (𝑜𝑖 − 𝑒𝑖 ) (𝑜𝑖 − 𝑒𝑖 )2 (𝑜𝑖 − 𝑒𝑖 )2
𝑒𝑖
6 7.233 -1.233 1.5203 0.210189
7 6.717 0.283 0.0801 0.011923
10 8,267 1.733 3.0033 0.36329
8 8.789 -0.789 0.6225 0.07083
8 6.767 1.323 1.7511 0.25879
6 6.283 -0.283 0.0801 0.01275
6 7.733 -0.099 0.0097 0.0013
9 8.217 0.783 0.6131 0.0746
𝑋 2 𝐶𝑎𝑙 1.003672
Step-9: P-value (Actual Rejection Region) p-value can be obtained at 𝑋 2 𝐶𝑎𝑙 = 1.002673,
from𝑋 2 table, we can observe that the probability at
𝑋 2 𝐶𝑎𝑙 = 1.002673 is approximate at 0.800363 or 80.0363%
At 3 d.f. Hence actual probability of rejection region is 80.0363%
Step-10: Power of Statistical Test p-value indicate that power of statistical test will
be expected for certain unknown population may be β = 1– (p-
value) = 1 – 0.800363 = 0.199637 or 19.9637%.
Step-11: Types of Statistical Error Type-II statistical Error occur because our
calculated value fall in acceptance region
𝑋 2 𝐶𝑎𝑙 < 𝑋 2 𝑇𝑎𝑏 = 1.002673 < 7.815
Step-12: Post Test ANOVA Curve
Step-13: Decision: We can confidently conclude that we can accept the null
hypothesis at 5% level of significance, but we strongly reject null
hypothesis at p-value 80.0363%%. Therefore, Type-II statistical
error occurred.
- : * * * * * Good Luck * * * * * : -
𝑥̅ − 𝜇 𝜎21 𝜎22
t-Statistic: 𝑠 𝜎𝑥̅1 −𝑥̅2 = √ +
𝑛1 𝑛2
√𝑛
For Possible Sample Size 𝑁 2
𝑥− 𝜇 𝑁!
Z-score: For Possible Sample Size (𝑁−𝑛)!
𝜎
𝑁−𝑛
(1-𝛼)% 𝐶. 𝐼 𝑓𝑜𝑟 𝜇 = 𝑃𝑟𝑜𝑏. (𝑥̅ ± 𝑍𝛼
𝜎
) Finite Population Multiplier √
𝑁−1
2 √𝑛
𝑥
𝑝̂ =
(1-𝛼)% 𝐶. 𝐼 𝑓𝑜𝑟 𝜇 = 𝑃𝑟𝑜𝑏. (𝑥̅ ± 𝑡(𝛼,𝑣)
𝑠
) 𝑛
2 √𝑛
𝑁
K=
𝑛
𝑝̂ 𝑞̂
(1-𝛼)% 𝐶. 𝐼 𝑓𝑜𝑟 𝑝 = 𝑃𝑟𝑜𝑏. (𝑝̂ ± 𝑍𝛼 √ )
2 𝑛
𝑍𝛼 𝜎
𝑛 = [ 2 ]2
(1-𝛼)% 𝐶. 𝐼 𝑓𝑜𝑟 (𝑝1 − 𝑝2 ) = 𝑃𝑟𝑜𝑏. [(𝑝̂1 − 𝑝̂2 ) ± 𝑍𝛼 𝜎(𝑝̂1 −𝑝̂2) ] 𝑒
2
𝑍𝛼 2
(1-𝛼)% 𝐶. 𝐼 𝑓𝑜𝑟 (𝜇1 − 𝜇2 ) = 𝑃𝑟𝑜𝑏. [(𝑥̅1 − 𝑥̅2 ) ± 𝑍𝛼 𝜎(𝑥̅1−𝑥̅2) ] 𝑛= 2
𝑝̂ 𝑞̂
2 𝑒2