Midterm exam review
HAP 602
Population vs. Sample – Example
• NH primary exit poll
- 59% of Republican men supported T vs. 51% of women supported H
- “These are results from a survey of 2,129 voters as they exited randomly
selected voting sites in New Hampshire on Jan. 23, 2024. The poll was
conducted by Edison Research for the National Election Pool…”
• Q. What is “population” here?
• Q. What is “sample” here?
Population vs. Sample – Examples (3)
• In a Northwest city, all primary care physicians were sent surveys about their
perceptions relating to patient use of preventive services. Surveys were
returned by 55% of the physicians.
- Q. Identify the population and the sample in the study.
Hypotheses
Hypotheses – Example
• The problem
- In the 1970s, 20–29 year old men in the U.S. had a mean body weight of 170
pounds. We test whether mean body weight in the population differs now vs.
1970s
- Research question: Did mean body weight of young U.S. adults 20-29 years
old change over time (from 1970s to now)?
• Null hypothesis?
• Alternative hypothesis?
Hypotheses – Example (2)
• Younger nurses are more likely to complain about adverse working conditions
at work
- H0?
- H1?
• Paid sick leave increases the probability that workers use more primary care
- H0?
- H1?
Descriptive vs. Inferential – Examples (2)
• 1,000 30-year old women were followed for 20 years. At the end of the study,
75% of unmarried women were still alive, and 86% of married women were
still alive.
- Q. What parts of the statement include descriptive statistics?
- Q. Can any inferences be drawn from this statement?
Variables
Variables – Examples
• Q. Identify the following as qualitative or quantitative variables. For
quantitative variables, indicate whether they are continuous or discrete.
- Minutes in the waiting room (quantitative)
- Gender of participants in a smoking cessation program (qualitative)
- Number of patients each nurse cares for (quantitative)
- Salary ($) of nurse practitioners in a hospital system (quantitative)
- Transportation method in getting to the hospital (qualitative)
Levels of Measurement
Levels of Measurement – Example
- Name of favorite candy bar (Nominal)
- Why? The word “name”
- Weight of luggage (lbs) (Ratio)
- Why? Has zero value
- Year of birth (Interval)
- Why? data have equal intervals
- Number of children in a family (Ratio)
- Why? Because ….
- Jersey numbers for a football team (Nominal)
- Why? (Jersey numbers) same as the zip codes
Levels of Measurement – Example 2
- A rating scale asking about satisfaction with physicians (low/middle/high) (Ordinal)
- Why? Because socio economic status Used to order, rank, or give preferences
to the subjects.
- Annual salary ($) of hospital administrators (Ratio)
- Why?
- Zip codes (Nominal)
- Why? It indicates a name of an areas rather than a numerical value.
- # of people dying from Covid-19 from each state (Ratio)
- Why?
- Gender of participants in a smoking cessation program (Nominal)
- Why? The key word is “gender”
Two-Sample T Test
Two-Sample T Test – Example Practice
• Open “SAT” excel file from Blackboard (Content > Week2)
• You may need to load “Analysis ToolPak” in Excel (see this instruction)
• The data show SAT scores of selected students (N=321 each) at 2 large high
schools in Virginia. One group participates in a newly-developed curriculum,
while the other participates in a more traditional curriculum
• Determine whether there is a significant difference in the mean SAT scores
between the two schools at a 5% level of significance
• Hypotheses?
• Results?
Two-Sample T Test – Example Practice
• Determine whether there is a significant difference in the mean SAT scores
between the two schools at a 5% level of significance
• Hypotheses?
- H0: Mean SAT for traditional curriculum = Mean SAT for new curriculum
- Ha: Mean SAT for traditional curriculum Mean SAT for new curriculum
• Results?
- t statistics (3.094) is bigger than t critical value (1.96) for two-sided test
- Thus, we reject the null
- There is statistically significant difference
Distribution of Sample Means – Practice
- Assume there are 3 numbers: 10, 20, 30
- Q. Now, take all possible samples with n = 2
(10, 10), (10, 20), (10, 30) … Chart Title
25
20
20
- Q. Take the means of each sample 15
15
• (10, 10) = 10 10
10
• (10, 20) = 15 5
• (10, 30) = 20 0
Sample 1 Sample 2 Sample 3
- Q. Now draw a frequency histogram Series 1 Column2 Column1
Z-Score & Probability
Z-Score & Probability – Example
- Suppose a normal distribution with a mean of 4.43
and a standard deviation of 1.5
- What is the probability that you get a random
observation greater than 6.00 being observed?
- Q. ( > 6.00 | = 4.43 and = 1.32) = ?
Z – Score = 1.19
-: z-score
-: individual value
-: sample mean
-: standard deviation
Continue next slide
Z-Score & Probability – Example
- Q. ( > 6.00 | = 4.43 and = 1.32) = ?
- First, find the z-score of 6.00 given the
mean & standard deviation
- Then, find the area between the mean
and the z-score of 1.19 using the z-
distribution table (textbook Table B.1.);
0.3830
Continue next slide
Z-Score & Probability – Example
- Q. ( > 6.00 | = 4.43 and = 1.32) = ?
- The area between the mean and the z-score
of 1.19; 0.3830
?
- The area under curve is always 1 (Look at ?
slide 11) & right half should be 0.5
So, now we subtract 0.5 from 0.3830 to get the number of the small
area under the curve
Continue next slide
Z-Score & Probability – Example
- Q. ( > 6.00 | = 4.43 and = 1.32) = ?
- A. The probability that you get a random
observation greater than 6.00 being
observed is 11.7%
Z-Score & Probability – Practice
- Suppose a normal distribution with a mean of 3.5 and
a standard deviation of 1.5
- What is the probability that you get a random
observation smaller than 5 being observed?
- Q. ( < 5.00 | = 3.5 and = 1.5) = ?
- Z – Score = 1
- Now look at the table to get the area under the curve
- Area under the curve = .3413
- Probability that I will get a random observation smaller
than 5 being observed = 0.5 + 0.8413 (84.13%) -: z-score
-: individual value
-: sample mean
Where did we get the 0.5 and the 0.8413 (84.13%)? -: standard deviation
Z-Score & Probability – Practice (2)
- Suppose a normal distribution with a mean of 7.2 and a
standard deviation of 2.4
- What is the probability that you get a random
observation between 8.5 and 10 being observed?
- Q. (8.5 < < 10.00 | = 7.2 and = 2.4) = ?
- First, we have to get the Z – score for both (8.5 and 10 )
Z – Scores Table
- Z – Score for 8.5 = 0.49 0.49 = .1879 -: z-score
1.17 = .3790 -: individual value
- Z – Score for 10 = 1.17
-: sample mean
-: standard deviation
37.90 %
18.79 %
The area between 8.5 and 10 = 0.3790 – 0.1879 = 0.1911 (19.11 %)
Z-Score & Probability – Practice (3)
- Suppose you have healthcare
utilization records of all Mason
students (N=3,000) as shown in right Variable Mean SD
Outpatient visits /
3.2 2.0
year
- A column headed “Mean” presents
Hospitalization /
0.4 0.8
mean score for each variable year
Emergency
- A column headed “SD” presents department 1.3 1.2
visits / year
standard deviation for each variable
Z-Score & Probability – Practice (3)
- Suppose you randomly selected three students with annual
outpatient visits as;
- Student A: 4
- Student B: 5
- Student C: 1 Variable Mean SD
- Assuming a normal distribution, Outpatient visits
3.2 2.0
/ year
1) How many SD above or below the mean?
2) which student’s outpatient utilization is the farthest from the Hospitalization /
0.4 0.8
year
mean?
- Here we have to do the Z – Score for each students' group Emergency
department 1.3 1.2
- Where did we get the 2? visits / year
Student group Z – Score
Student A : 4 - Z – Score = 0.4
Student B : 5 - Z – Score = 0.9
Student C : 1 - Z – Score = 1
Type I & II Errors
Type I & II Errors – Example
• The FDA has to decide how restrictive to be when approving new drugs
• There always exists a risk of incorrect decision
- Type I error; ineffective/unsafe drug is approved
- Type II error; effective/safe drug is rejected
• Both errors are bad
- Type I error -> huge safety issue
- Type II error -> companies lose $, patients don’t have access to effective
drugs
Type I & II Errors – Example
• The problem is that there is “trade-off” between the two errors
- With tight regulation, type II error
- With permissive regulation, type I error
• Regulators need to balance social welfare and potential harm by finding the
optimal regulation
Type I & II Errors – Role of Sample Size
• The trade-off between the two errors is NOT
inevitable
• Central limit theorem; the means of a random
sample of size distribute normally with mean
and variance
• Thus, the larger the sample size, the smaller
the variance of the sampling distribution of
the mean
• With bigger sample size, both errors can be
lowered
Rejection of Null Hypothesis
• We reject the null when the value of the
test statistic is beyond the critical value of
the statistical test
• The critical value is the value of the test
statistic at the level of significance (alpha, )
that we pre-selected
• Note that area under nonrejection region >
area under rejection region
- Meaning, we reject the null only if the test
statistic is unlikely to get “by chance”
Rejection of Null Hypothesis – One-tailed Test
• Rejection region is typically set as
- Meaning that nonrejection region is 95%
- Corresponding z score can be pre-
determined based on standard normal
distribution
- Find the z score that has the area under
curve as 95%? (z-distribution table)
• If the value of Z > 1.645, then we reject
the null (otherwise, we fail to reject the
null)
Rejection of Null Hypothesis – Two-tailed Test
• Rejection and nonrejection regions for a
two-tailed test at (0.025 in each tail)
• We split the probability between the two
ends of the distribution when performing
a two-tailed test
- Find the z score that has the area under
curve as 97.5%? (z-distribution table)
• We reject the null if Z > 1.96 or Z < -1.96
Sample size calculation
Sample size calculation
• Sample size calculation formula varies by • Z-score for &
outcome measures, number of groups,
and study design
Significance level Power
• For two-group comparison with
continuous outcome,
5% 1% 0.1% 80% 85% 90% 95%
Minimum required
• We need large mimumum when…
1.96 2.58 3.29 0.84 1.04 1.29 1.64
- Standard deviation
- or (strict threshold)
- expected effect size
Sample size calculation – Example
• Randomized controlled trial in a few states
- On behavioral response to health insurance uptake on marketplace
- To test whether alternative messaging increases insurance uptake among
uninsured people
• Traditional campaign: “buy insurance, it could save your $ and your life”
• Alternative campaign: “buy insurance, you save your $ AND you can protect
others who are not able to buy it” (because healthy people’s uptake would
lower the community-rated premium)
Sample size calculation – Example
• General research idea; comparing insurance uptake (%) between traditional
campaign group vs. alternative campaign group (where each individual will
be randomly assigned to either group)
• How big should my sample be? We can’t request million dollars to the
funder without explanations
• Set & power (1 – 𝜷): Typically, 0.05 for & 0.8 for power
• Predict effect size; literature on the impact of traditional campaign (about
8%)
- I expect the intervention would increase uptake by 0.8 ppt -> 38K people
needed
Sample size calculation – Example
• How big should my sample be?
- I expect the intervention would increase uptake by 0.8 ppt -> 38K people
needed
• What if…
- Actual effect were small (0.4 ppt) -> 147K
- With lower (0.01) & unchanged expected effect (+0.8ppt) -> 56K
- With higher power (0.9) & unchanged expected effect (+0.8ppt) -> 50K
• Assuming that the experiment costs $10 per person, expected budget can
change by $1 million
Sample Size Calculation – Implication
• Even randomized controlled trials (“gold standard”) cannot detect statistically
significant effect with insufficient sample size
• If research findings were not significant, ask…
- Whether hypotheses, methods, analysis, and interpretation were correct
- And whether sample size is big enough
• When reading research articles, always ask: “do you have statistical power
strong enough to detect statistically significant effect?” (= “is your sample
big enough?”)
One-Sample Z test
Z-Test Statistic – Example
,
• Compare mean math score of sample to
the population (whole students) mean
- H0? Size Mean
Standard
Deviation
- H1? Sample 25 100 5.0
- Z=?
Population 1,000 99 2.5
- Conclusion (at =0.05)?
-; mean of the sample
Solution -; mean of the population
- (standard error of the mean); the value we would
- H0: expect by chance, given all the variability that surrounds
the selection of all possible sample means from a
- H1: (two-sided test) population
-; standard deviation of population
-, -; size of the sample
- Conclusion (at =0.05)? Continue next slide
Z-Test Statistic – Example
- H0:
- H1? (two-sided test)
- Conclusion (at =0.05)? 2 > 1.96
- Z-test statistic (2) is more extreme than the
critical value (1.96) given =0.05 -> The null is
rejected
- The sample mean is statistically significantly
different from the population mean
Z-Test Statistic – Practice
,
• See if mean age of sample is bigger than
the population mean Size Mean
Standard
deviation
Sample 100 45 5
- H0?
Population 100,000 42 8
- H1?
- Z=?
- Conclusion (at =0.05)? (Z-score for one-
sided test with =5% is 1.645)
(Independent/Unpaired) Two-Sample T test
Independent T-Test – Steps
• State null & research hypotheses
• Set the level of significance (type I error, ); usually 5% (or 0.05)
• Determine degrees of freedom (df);
• Compute t statistic , where is variance
- Numerator shows the mean difference between group 1 and group 2
- Denominator; The amount of variance within and between the two groups
• Compare t statistic to the critical value (given df); textbook Table B.2 (page
379)
Independent T-Test – Example
• Open “NursingHome” excel file from Blackboard (“Week4”)
• The data show patients’ satisfaction scores from two nursing facilities owned
by a corporation
• At a 5% level of significance, test whether there is a statistically significant
difference in satisfaction between the two facilities
- Note that, when a nondirectional (two-tailed) test is used, the t value is
represented as an absolute value; you should consider “both ends”
- P value: The probability of obtaining the current t statistic “by chance” ->
reject the null when P value < (pre-determined)
Independent T Test – Example
• At a 5% level of significance, test whether there is a statistically significant
difference in satisfaction between the two facilities
• You can do it in 2 ways
1. Compute t statistic (), and compare it to the critical value given df (= 10 + 10
– 2 = 18) using t table: Reject the null if t statistic is more extreme than the
critical value (2.1)
2. Use “Data Analysis > t-test” tool in Excel
- If you have not done yet, you should load “analysis toolpak” in Excel
Independent T Test – Practice
• Open “SAT” excel file from Blackboard (“Week4”)
• The data show SAT scores of selected students at 2 large high schools in
Virginia. One participates in a newly-developed curriculum, while the other
participates in a more traditional curriculum
• Determine whether there is a significant difference in the mean SAT scores
between the two schools
- At a 5% level of significance
- At a 1% level of significance
• State your hypotheses
• Interpret statistical test results
Independent T Test – Practice (2)
• Table on the right shows 16 students’ Group 1 Group 2
systolic blood pressure (mmHg) in two 140 116
groups
135 124
• Determine whether there is a
145 151
significant difference in the blood
pressure between the two groups 120 134
- At a 5% level of significance 122 136
- At a 1% level of significance 134 142
150 128
115 115
Statistical Significance
Statistical Significance – Interpretation
• Assume a researcher conducted a
study on caffeine’s behavioral effect
on 150 students
- For two groups: high-caffeine vs. no
caffeine group
- And conducted independent-
sample t test
- On 4 outcome measures
• Q. On which outcome(s) does
caffeine have significant effect?
Statistical Significance – Interpretation
• Assume a researcher conducted a
study on caffeine’s behavioral effect
on 150 students
- For two groups: high-caffeine vs. no
caffeine group
• Q. On which outcome(s) does
caffeine have significant effect?
- A. Self-rated irritability & self-
reported outburts
Statistical Significance – p-value
• P-value (probability value)
- The probability of obtaining the size of the obtained test statistic (or larger
than the obtained statistic) if the null hypothesis were true
- In most studies, the null hypothesis indicates status quo (no difference or no
relationship) in the population
- If the test statistic is extreme (lie in the rejection region), p-value should be
small
- If the p value is small enough (compared to the level of significance ()), we
may reject the null hypothesis and tell that the results are statistically
significant
Statistical Significance – Correct Interpretation
• Q. Regarding the effect of caffeine on self-
rated irritability, which is the correct
interpretation?
- There is only a 5% chance that the mean score
for the high-caffeine group is equal to the mean
score for the no-caffeine group in the
population
- There is only a 5% chance that the null
hypothesis is true
- If the null hypothesis were true, there is only a
5% probability that we would have obtained a t
statistic this large (or larger)
(Dependent/Paired) Two-sample t-test
Dependent T Test
• Another names: t test for paired samples, t test for correlated samples
• Used when a single group of the same subjects/individuals is being studied
under two conditions (before- and after-treatment)
• Primarily depends on differences between individual scores
• t statistics:
- ; difference between each individual’s score from point 1 (before) to point 2
(after)
- ; the sum of all the differences between groups of scores
- ; the sum of the differences squared between groups of scores
- ; the number of “pairs” of observations
Dependent T Test – Steps
• State hypothesis; if two-tailed tests,
• Set the level of risk (the level of significance/Type I error): typically 0.05
• Determine the degrees of freedom (df)= n – 1 (where n is the number of pairs
of observations)
• Compare the obtained t statistic and the critical value (Table B.2. or t table); if
your obtained value is larger (more extreme) than the critical value, reject
the null hypothesis
Dependent T Test – Example
• Open “InsuranceRating” excel file from Blackboard (“Week4”)
• The data contains ratings provided by seven users of an insurance program
before and after the company made some administrative changes in the
process of filing for claims
• At a 5% level of significance, test whether there was a statistically significant
improvement in ratings after the change was made
• State your null and research hypotheses
• Conclusion?
• Tip: when running one-tail test, put “after” column first in “Variable 1 range”
Dependent T Test – Example
• State your null and research t-Test: Paired Two Sample for Means
hypotheses (one-tailed test) Post-Change Pre-Change
Scores Scores
Mean 29.14286 23.28571
Variance 146.8095 103.2381
Observations 7 7
• Conclusion? Pearson
0.864686
Correlation
- p-value for one-tail test is lower than Hypothesized
0
Mean Difference
0.05
df 6
-> Reject the null (there was a t Stat 2.542712
P(T<=t) one-tail 0.02196
statically significant improvement)
t Critical one-tail 1.94318
P(T<=t) two-tail 0.04392
t Critical two-tail 2.446912
Dependent T Test – Practice
• Open “OECDmortality” excel file from Blackboard (“Week4”)
• The data contains standardized all-cause mortality (per 100k population) of
31 OECD countries in 2019 & 2020 (taken from here)
• At a 5% level of significance, test whether there was a statistically significant
change in mortality after the Covid-19 pandemic
• State your null and research hypotheses
• Conclusion?
Reliability & Validity
Reliability – Cronbach’s alpha
- is # of items
ID Item 1 Item 2 Item 3
- ; sum of variance of each item 1 3 5 2
2 4 4 3
- ; variance for “overall score” 3 3 4 4
- > 0.7 is considered “acceptable” & < 0.5 is 4 3 3 3
5 3 4 3
poor/unacceptable 6 4 5 5
7 2 5 5
8 3 4 4
• Data show 10 patients’ satisfaction scores (1 (lowest) to 9 3 5 4
5 (highest)) for each item (doctor, nurse, location) 10 3 3 2
• You expect that people scoring high on doctor would
also score high on nurse/location
Reliability – Cronbach’s alpha
- is # of items
ID Item 1 Item 2 Item 3 Total
- ; sum of variance of each item
1 3 5 2 10
- ; variance for “overall (total) score” 2 4 4 3 11
3 3 4 4 11
4 3 3 3 9
5 3 4 3 10
• First, get “total scores” by summing up each 6 4 5 5 14
person’s scores 7 2 5 5 12
8 3 4 4 11
• Then, get and (use Excel’s “var.s” function) 9 3 5 4 12
10 3 3 2 8
Validity vs. Reliability
• Think of throwing a dart
- Let’s say score 10 is “true value” and
your dart is “sample measurement”
- Your goal is getting all 10 or very close
to 10
• Valid test is always reliable
• Reliable test can be invalid
(“consistently wrong”)
Validity – Cohen’s Kappa ()
- A metric used to assess the agreement between two measures/raters (can also
be used for interrater reliability)
- It takes into account the possibility of the agreement occurring “by chance”
- is observed agreement among measures
- is the hypothetical probability of “chance agreement”
- If the two measures are in complete agreement,
- If there is no agreement other than what would be expected by chance,
- > 0.6 is considered substantial agreement
- > 0.8 is considered almost perfect agreement
Validity – Cohen’s Kappa
• Observed agreement ()
• Chance agreement of “yes”
Test B
• Chance agreement of “no”
yes no total
• Overall Chance agreement () 21
Test A Yes 25 5 30
No 45 25 70
total 70 30 100
Validity – Practice
• Observed agreement () 0.5
Test B
• Chance agreement of “yes”0.25
yes no total
• Chance agreement of “no” 0.25
Test A Yes 25 25 50
• Overall Chance agreement ()
No 25 25 50
total 50 50 100
=
K = 1-1 = 0
Validity – Example (Table 9)
• Cohen’s Kappa ()
• Observed agreement () 0.875
• Chance agreement of “yes” 0.114 Self report (current user)
yes no total
• Chance agreement of “no” 0.439
Saliva Yes 88 25 113
• Overall Chance agreement ()
No 15 192 207
• 0.72
total 103 217 320
ANOVA – Example (“CognitiveScore” File)
• The file contains cognitive scores (continuous from 0 to 15) of 25 people in 3
groups
• Test if mean cognitive scores were different between the groups
• Go to “Data” > “Data Analysis” > “Anova: Single factor” in Excel
ANOVA – Example (“CognitiveScore” File)
• Degrees of freedom ()
- Between-group: # groups – 1
- Within-group: # all data points – #
groups
• Sum of squares ()
- Between-group:
- Within-group:
• Mean squares ()
• ; critical F value given degrees of
freedom & alpha
-> compare to
ANOVA – Example (“CognitiveScore” File)
-:
• In this example,
- : or or
- P-value=0.006 (smaller than alpha)
- You can reject the null hypothesis
(not all the group means are equal)
ANOVA – Treatment Effect Example 1
• 9 individuals with treatment effect scores (ranges 0–10) in the groups A, B,
and C (based on drugs) 10
- A: 3, 3, 3 (mean=3) 8
- B: 5, 5, 5 (mean=5) 6
- C: 7, 7, 7 (mean=7) 3
• Run one-way ANOVA and see if there 1
0
a b c
• is any significant difference in group means
• Within group variation is ZERO
• F-stat from Excel (65535) is bigger than critical value (5.14)
- > Significantly different group means
• Then, the question is how much means of the groups are dispersed.
ANOVA – Treatment Effect Example 2
• 9 individuals in the groups A, B, and C (based on regimens) – same group means as
Example 1 10
9
- A: 1, 3, 5 (mean=3) 8
- B: 3, 5, 7 (mean=5) 6
- C: 5, 7, 9 (mean=7) 4
- Run one-way ANOVA and see if there 2
is any significant difference in group means 0
a b c
• Within-group variation is NOT zero
• Then, the question is how big between-group variation is relative to within-group
variation.
• F-stat from the Excel (3) is smaller than critical value (5.14)
-> Group means are equal (because of (relativity) large within-group variation)
ANOVA – Treatment Effect Example 3
• 9 individuals in the groups A, B, and C (based on regimens) – two identical
groups 10
- A: 3, 3, 3 (mean=3) 8
6
- B: 3, 3, 3 (mean=3) 5
- C: 5, 6, 7 (mean=6) 3
• Run one-way ANOVA and see if there is any 1
0
a b c
significant difference in group means
• Are mean differences between A vs. C or B vs. C significant?
• F-stat from Excel (27) is bigger than critical value (5.14)
- > Not all group means are statistically the same
ANOVA – Treatment Effect Example 4
• 9 individuals in the groups A, B, and C (based on regimens) – two identical
groups (same as Example 3) & within-group vitiation in group C is larger
10
- A: 3, 3, 3 (mean=3) 9
7
- B: 3, 3, 3 (mean=3) 6
- C: 3, 3, 9 (mean=5) 4
• Run one-way ANOVA and see if there is any 2
significant difference in group means
0
a b c
• Q. Among the 4 examples, in which ones do you expect significant results?