Master of Public Health
Biostatistics - Assignment 1
Instructions:
To complete the assignment, please type your answers into this document.
Submit your assignment document when you have completed it, by uploading it to Moodle.
o Kindly refer to the section on Guidance with doing assignments in the orientation
module for further assistance.
o Ensure to include the assignment cover sheet, without it, your assignment will not be
accepted.
You do not need to show your working, unless asked for in the question.
Under no circumstances are you allowed to collaborate with anyone else on your submission.
Such plagiarism is considered academic misconduct and may result in a student being asked to
leave the programme.
If you have any problems submitting your assignment, please contact your assigned Course
Coordinator.
Question 1 [3 marks]
For each of the following, state whether the described variable is an example of nominal, ordinal
or ratio data.
a) The height of a 3-year-old child in centimetres [1 mark]
Answer:
b) The percentile height of a 3-year-old child [1 mark]
Answer:
c) The eye colour of a 3-year-old child [1 mark]
Answer:
1|Page
Question 2 [5 marks]
The table below shows the number of patient visits by month in an outpatient clinic during the last
year. You are hoping to use this information to decide whether you have adequate staffing for the
clinic.
Month Number of Visits
January 320
February 410
March 454
April 328
May 560
June 494
July 2800
August 596
September 475
October 391
November 430
December 458
a) What is the mean number of patient visits per month? [1 mark]
Answer:
b) You take the mean figure above to the hospital director to support your case that you need
more staff. The director, however, disputes this claim based on another measure of central
tendency. What is (i) the name and (ii) the value of the statistic the director is referring to?
[2 marks]
Name of the statistic:
Value of the statistic:
c) Why are the values of the two statistics so different? [1 mark]
Answer:
d) Give one reason why the recorded number of visits in July might be much higher than the rest
of the year? [1 mark]
Answer:
2|Page
Question 3 [6 marks]
Two researchers are studying the effects of a new drug on recovery time after hospital admission
for asthma. Both take a sample of 25 children who are admitted with complications of asthma,
give them all the new drug (in addition to usual therapy), and note how long they stay in hospital.
The first researcher calculates the average stay to be 4 days (standard deviation = 2 days). The
second researcher finds an average length of stay of 5 days (standard error = 0.4 days)
a) In words, explain the difference between the two different measures of variability that the
researchers used. [2 marks]
Answer:
b) A third researcher performed a similar study on a sample size of 64 and found the standard
deviation for the length of stay to be 2 days. What is the standard error in this case? [1 mark]
Answer:
c) For any study with a sample size greater than 1, which one of the following is true? [1 mark]
A. The standard error is always greater than the standard deviation.
B. The standard error is always less than the standard deviation.
C. The standard error is always equal to the standard deviation.
D. The standard error is always greater than or equal to the standard deviation.
E. None of the above are true.
Answer:
d) As the sample size of a study increases, which one of the following is true? [1 mark]
A. The standard deviation decreases.
B. The standard deviation increases.
C. The standard error decreases.
D. The standard error increases.
E. Both the standard error and standard deviation decrease.
Answer:
e) “Statistics calculated from large samples are more likely to have values close to the unknown
population values than those calculated from small samples.” Is this statement true or false?
[1 mark]
Answer:
3|Page
Question 4 [4 marks]
The following figure shows the distribution of differences between the heights of mothers and
their adult daughters:
-4 -2 0 2 4
a) What type of plot is this? [1 mark]
Answer:
b) What is the mean and standard deviation of the data shown in this figure? [2 marks]
Mean:
Standard Deviation:
c) What type of plot is this? [1 mark]
Answer:
4|Page
Question 5 [5 marks]
The following figure shows the completion times of a school cross-country race in 2018:
a) Describe the distribution of the values. [2 marks]
Answer:
The same race was completed by children the previous year (2017). In that case, the completion
times were normally distributed with a mean of 22 minutes, and a standard deviation of 4
minutes.
b) In 2017, what proportion of children completed the race in more than 22 minutes? [1 mark]
Answer:
c) In 2017, what proportion of children completed the race between 18 and 26 minutes? [1 mark]
Answer:
d) In 2017, approximately 2.5% of completion times were below what value (to the nearest
minute)? [1 mark]
Answer:
5|Page
Question 6 [10 marks]
It is known that in a given population, 40% of people are regular users of insecticide-treated nets
(ITNs). One hundred people from this population attend an educational program, after which 52
report regular ITN use.
a) Using GraphPad (p59 of e-book) or any other software, calculate the probability that at least
52 out of the 100 people who underwent the educational program used an ITN regularly, if
they have the same probability of regular ITN use as the rest of the population. [3 marks]
Answer:
b) What can you conclude based on the above result? [2 marks]
Answer:
c) Is your conclusion above reached with certainty? Why or why not? [2 marks]
Answer:
d) You read two papers each describing the effects of educational programs on ITN use, and
comparing ITN use before and after the program. One found a significant increase in ITN use
(p=0.04) and the other found a significant decrease in ITN use (p=0.02). Suggest three possible
reasons for the different results. [3 marks]
Answer:
Question 7 [6 marks]
Large-scale studies suggest the mean blood cholesterol level in children aged 10 to 14 years is 150
mg/dL (standard deviation 30 mg/dL). In order to determine if cholesterol levels have a familial
link, you identify 100 men who have high cholesterol [designated “high-risk”] and have a child
aged 10-14 years, and measure the cholesterol level in the child.
You find that the mean cholesterol level of the 100 children is 170 mg/dL.
a) What is (i) the standard error and (ii) 95% confidence interval for the average cholesterol level
for children with high-risk fathers? (Please show any formulas you are using.) [4 marks]
Answer:
b) Can you reject the null hypothesis that the average cholesterol level for children with high-risk
fathers is 150 mg/dL? Explain your choice. [2 marks]
6|Page
Answer:
Question 8 [8 marks]
You are investigating whether polio vaccination rates have changed over a 10-year period in a
particular country. You obtain a random sample of 60 districts in the country, and find that the
vaccination rate has decreased in 20 of them, and increased in 40 of them. (There were no districts
in which the vaccination rate had not changed.)
a) What proportion of observed districts showed an increase in vaccination rates? [1 mark]
Answer: You have:
60 districts in total
40 districts with an increased vaccination rate
The proportion (PPP) of districts with an increased vaccination rate is given by:
P=Number of districts with increased vaccination ratesTotal number of districtsP = \frac{\
text{Number of districts with increased vaccination rates}}{\text{Total number of
districts}}P=Total number of districtsNumber of districts with increased vaccination rates
Substituting the values:
P=4060P = \frac{40}{60}P=6040
Simplifying this fraction:
P=23P = \frac{2}{3}P=32
the proportion of observed districts that showed an increase in vaccination rates is 23\frac{2}
{3}32 or approximately 0.667 (or 66.7%
b) Assuming across the country there had been no real change in vaccination rates over the 10-
year period, what would be the probability of observing a district with a positive change?
[1 mark]
Answer: If we assume that there has been no real change in vaccination rates across the
country over the 10-year period, then we can think of the observed changes in each district as
a result of random chance. Under this assumption, we would expect the probability of
observing an increase in vaccination rates to be equal to the probability of observing a
decrease.
Since there are only two outcomes (increase or decrease) and they are equally likely under
the assumption of no real change, the probability of observing a district with a positive
change is:
7|Page
P(positive change)=12P(\text{positive change}) = \frac{1}{2}P(positive change)=21
So, the probability of observing a district with a positive change in vaccination rates is 12\
frac{1}{2}21 or 0.5 (50%).
c) Using a binomial test, you find the probability of observing 40 or more positive changes (or 20
or fewer negative changes), when the true vaccination rate has not changed, is 0.0135. What
can you conclude about the change in vaccination rates? [2 marks]
Answer: To draw a conclusion about the change in vaccination rates, we need to interpret the
result of the binomial test.
Null Hypothesis (H0):
There is no real change in vaccination rates over the 10-year period. The probability of
observing an increase in any given district is 0.5.
Alternative Hypothesis (H1):
There is a real change in vaccination rates over the 10-year period. The probability of
observing an increase in any given district is not 0.5.
Binomial Test Result:
The binomial test provides a probability (p-value) of 0.0135 for observing 40 or more
districts with positive changes (or equivalently, 20 or fewer districts with negative changes)
under the null hypothesis.
Conclusion:
The p-value (0.0135) represents the probability of obtaining the observed result (or
something more extreme) if the null hypothesis were true.
A common significance level (α\alphaα) used in hypothesis testing is 0.05.
Since the p-value (0.0135) is less than the significance level (0.05), we reject the null
hypothesis.
Interpretation:
Rejecting the null hypothesis suggests that it is unlikely the observed changes (40 districts
with positive changes out of 60) are due to random chance. Therefore, we conclude that there
is statistically significant evidence to suggest that the vaccination rates have indeed changed
over the 10-year period in the sampled districts.
8|Page
d) What is the observed significance level for the test that the vaccination rate has not changed?
[1 mark]
Answer: The observed significance level, also known as the p-value, for the test that the
vaccination rate has not changed is the probability of obtaining the observed result (or
something more extreme) under the null hypothesis. In this case, the null hypothesis is that
there is no change in vaccination rates, meaning the probability of observing an increase in
any given district is 0.5.
The binomial test provided a p-value of 0.0135 for observing 40 or more districts with
positive changes (or equivalently, 20 or fewer districts with negative changes) under the null
hypothesis.
Thus, the observed significance level for the test that the vaccination rate has not changed is:
p-value=0.0135\text{p-value} = 0.0135p-value=0.0135
This p-value indicates the likelihood of observing the given data (40 increases out of 60
districts) or more extreme data, assuming the null hypothesis is true.
4o
e) If you had only observed 6 districts, and found that 4 had a positive change and 2 had a
negative change, would you expect to draw the same conclusions as when you observed 60
districts with a similar pattern? Explain your choice.
[2 marks]
Answer: No, you would not necessarily draw the same conclusions if you observed only 6
districts with 4 showing a positive change and 2 showing a negative change compared to the
case where you observed 60 districts with 40 showing a positive change and 20 showing a
negative change. Here's why:
Statistical Significance and Sample Size
1. Sample Size and Reliability:
o Larger Sample Size (60 districts): With a larger sample size, the results are more
reliable and less likely to be due to random chance. The observed proportion (40 out
of 60 districts showing a positive change) gives a clearer picture of the underlying
trend.
o Smaller Sample Size (6 districts): With a smaller sample size, the results are more
susceptible to random variation and less reliable. Observing 4 out of 6 districts with
a positive change might not provide enough evidence to draw a firm conclusion
about the overall trend.
9|Page
2. P-Value Calculation:
o For the larger sample, the p-value (0.0135) indicated a statistically significant result,
leading to the rejection of the null hypothesis.
o For the smaller sample, we need to calculate the exact p-value using the binomial
distribution:
Calculating the P-Value for 6 Districts
Null Hypothesis (H0):
The probability of observing a positive change in any given district is 0.5.
Observed Outcome:
4 out of 6 districts showed a positive change.
The binomial probability P(X≥4)P(X \geq 4)P(X≥4) where XXX is the number of positive
changes in 6 trials (districts) can be calculated using the binomial distribution:
P(X≥4)=P(X=4)+P(X=5)+P(X=6)P(X \geq 4) = P(X = 4) + P(X = 5) + P(X =
6)P(X≥4)=P(X=4)+P(X=5)+P(X=6)
Using the binomial probability formula: P(X=k)=(nk)pk(1−p)n−kP(X = k) = \binom{n}{k}
p^k (1-p)^{n-k}P(X=k)=(kn)pk(1−p)n−k
Where:
n=6n = 6n=6
kkk is the number of positive changes (4, 5, or 6)
p=0.5p = 0.5p=0.5
Let's calculate each probability:
P(X=4)=(64)(0.5)4(0.5)2=15×0.015625=0.234375P(X = 4) = \binom{6}{4} (0.5)^4 (0.5)^2
= 15 \times 0.015625 = 0.234375P(X=4)=(46)(0.5)4(0.5)2=15×0.015625=0.234375
P(X=5)=(65)(0.5)5(0.5)1=6×0.015625=0.09375P(X = 5) = \binom{6}{5} (0.5)^5 (0.5)^1 = 6
\times 0.015625 = 0.09375P(X=5)=(56)(0.5)5(0.5)1=6×0.015625=0.09375 P(X=6)=(66)
(0.5)6(0.5)0=1×0.015625=0.015625P(X = 6) = \binom{6}{6} (0.5)^6 (0.5)^0 = 1 \times
0.015625 = 0.015625P(X=6)=(66)(0.5)6(0.5)0=1×0.015625=0.015625
Adding these probabilities gives us the cumulative probability:
P(X≥4)=0.234375+0.09375+0.015625=0.34375P(X \geq 4) = 0.234375 + 0.09375 +
0.015625 = 0.34375P(X≥4)=0.234375+0.09375+0.015625=0.34375
Interpretation:
P-Value for 6 Districts: 0.34375
Significance Level (α\alphaα): 0.05
10 | P a g e
The p-value (0.34375) is much higher than the significance level (0.05). Therefore, we fail to
reject the null hypothesis.
Conclusion:
With a sample size of 6 districts, the evidence is not strong enough to conclude that there is a
statistically significant change in vaccination rates. The high p-value indicates that the
observed pattern (4 out of 6 districts with positive changes) could easily be due to random
chance.
In contrast, with a sample size of 60 districts, the p-value (0.0135) was low enough to reject
the null hypothesis, indicating a significant change in vaccination rates.
Thus, the conclusions would differ due to the impact of sample size on the reliability and
statistical significance of the results.
4o
f) Are your observed results possible if there has not been a change in vaccination rates in the
country’s population? [Yes or No] [1 mark]
Answer: To determine whether the observed results are possible if there has not been a
change in vaccination rates in the country's population, we can evaluate the likelihood of
observing such results under the assumption of no change. This is typically done using the p-
value from the binomial test.
Recap of Observed Results:
Larger Sample Size (60 districts): 40 districts with positive changes, 20 districts with negative
changes.
P-value: 0.0135
Interpretation of P-value:
A p-value of 0.0135 indicates the probability of observing 40 or more positive changes (or
equivalently, 20 or fewer negative changes) out of 60 districts under the null hypothesis that
there has been no real change in vaccination rates.
Possible Outcomes Under the Null Hypothesis:
1. P-value (0.0135) Interpretation:
o The p-value represents the probability of obtaining the observed results (or more
extreme results) purely by chance if the null hypothesis is true.
o A p-value of 0.0135 means there is a 1.35% chance of observing such a result (or
something more extreme) if there has truly been no change in vaccination rates.
11 | P a g e
2. Significance Level Comparison:
o Common significance level (α\alphaα): 0.05
o Since the p-value (0.0135) is less than the significance level (0.05), we reject the null
hypothesis, indicating that the observed results are statistically significant and
unlikely to be due to random chance alone.
Conclusion:
The observed results (40 out of 60 districts with positive changes) are possible under the null
hypothesis, but they are highly unlikely. The low p-value (0.0135) suggests that it is
improbable for such a result to occur if there has been no real change in vaccination rates.
Therefore, we conclude that there is statistically significant evidence to suggest that
vaccination rates have changed over the 10-year period in the sampled districts.
Consideration for Smaller Sample (6 districts):
As discussed earlier, with a smaller sample size (6 districts, 4 with positive changes, and 2
with negative changes), the p-value is much higher (0.34375), indicating that the observed
result could easily occur by random chance, and we would not reject the null hypothesis in
that case.
In summary, while the observed results are theoretically possible if there has been no change
in vaccination rates, the low p-value in the larger sample strongly indicates that there has
likely been a real change in vaccination rates in the country's population.
4o
Question 9 [3 marks]
Zika virus infection in pregnant women is associated with microcephaly (small head) in the baby,
although this only occurs in a small proportion of cases. You want to investigate whether babies
born to Zika-infected mothers, who do not have microcephaly, still have a lower birthweight. A
sample of 100 non-microcephalic babies from Zika-infected mothers shows the average
birthweight to be 2500 grams.
It is known that in the population, the average birthweight of all babies is 3000 grams (standard
deviation = 500 grams), and is normally distributed.
a) Calculate a Z score for your sample birthweight of 2500 grams. Indicate the formula you use
and show your working. [3 marks]
END
12 | P a g e
TOTAL SCORE: / 50
13 | P a g e