Two-Sample Tests
Learning Outcomes
In this session, you learn:
• How to use hypothesis testing for comparing the difference
between
• The means of two independent populations
• The means of two related populations
Two-Sample Tests
Two-Sample Tests
Population Population
Means, Means,
Independent Related
Samples Samples
Examples:
Same group
Group 1 vs. before vs. after
Group 2 treatment
Difference Between Two Means
Population means, Goal: Test hypothesis or form
independent
samples
* a confidence interval for the
difference between two
population means, μ1 – μ2
σ1 and σ2 unknown,
assumed equal The point estimate for the
difference is
X1 – X2
σ1 and σ2 unknown,
not assumed equal
Difference Between Two Means: Independent Samples
• Different data sources
Population means, • Unrelated
independent
samples
* • Independent
• Sample selected from one population
has no effect on the sample selected
from the other population
Use Sp to estimate unknown
σ1 and σ2 unknown, σ. Use a Pooled-Variance t
assumed equal test.
σ1 and σ2 unknown, Use S1 and S2 to estimate
not assumed equal unknown σ1 and σ2. Use a
Separate-variance t test
Hypothesis Tests for Two Population Means
Two Population Means, Independent Samples
Lower-tail test: Upper-tail test: Two-tail test:
H0: μ1 μ2 H0: μ1 ≤ μ2 H0: μ1 = μ2
H1: μ1 < μ2 H1: μ1 > μ2 H1: μ1 ≠ μ2
i.e., i.e., i.e.,
H0: μ1 – μ2 0 H0: μ1 – μ2 ≤ 0 H0: μ1 – μ2 = 0
H1: μ1 – μ2 < 0 H1: μ1 – μ2 > 0 H1: μ1 – μ2 ≠ 0
Hypothesis tests for μ1 – μ2
Two Population Means, Independent Samples
Lower-tail test: Upper-tail test: Two-tail test:
H0: μ1 – μ2 0 H0: μ1 – μ2 ≤ 0 H0: μ1 – μ2 = 0
H1: μ1 – μ2 < 0 H1: μ1 – μ2 > 0 H1: μ1 – μ2 ≠ 0
a a a/2 a/2
-ta ta -ta/2 ta/2
Reject H0 if tSTAT < -ta Reject H0 if tSTAT > ta Reject H0 if tSTAT < -ta/2
or tSTAT > ta/2
Hypothesis tests for µ1 - µ2 with σ1 and σ2 unknown and assumed equal
Population means, Assumptions:
independent
▪ Samples are randomly and
samples
independently drawn
▪ Populations are normally
σ1 and σ2 unknown,
assumed equal
* distributed or both sample
sizes are at least 30
▪ Population variances are
unknown but assumed equal
σ1 and σ2 unknown,
not assumed equal
Hypothesis tests for µ1 - µ2 with σ1 and σ2 unknown and assumed equal
(continued)
• The pooled variance is:
Population means,
independent S 2
=
(n1 − 1)S1
2
+ (n2 − 1)S2
2
(n1 − 1) + (n2 − 1)
p
samples
• The test statistic is:
σ1 and σ2 unknown,
assumed equal
* ( X1 − X 2 ) − ( μ 1 − μ 2 )
t STAT =
2 1 1
Sp +
n1 n2
σ1 and σ2 unknown,
not assumed equal • Where tSTAT has d.f. = (n1 + n2 – 2)
Confidence interval for µ1 - µ2 with σ1 and σ2 unknown and assumed equal
Population means,
independent
samples
The confidence interval for
μ1 – μ2 is:
σ1 and σ2 unknown,
assumed equal
*
( X1 − X 2 ) tα/2 2
Sp
1
+
1
n1 n 2
σ1 and σ2 unknown, Where tα/2 has d.f. = n1 + n2 – 2
not assumed equal
Pooled-Variance t Test Example
You are a financial analyst for a brokerage firm. Is there a
difference in dividend yield between stocks listed on the NYSE
& NASDAQ? You collect the following data:
NYSE NASDAQ
Number 21 25
Sample mean 3.27 2.53
Sample std dev 1.30 1.16
Assuming both populations are approximately normal with
equal variances, is there a difference in mean
yield (a = 0.05)?
Pooled-Variance t Test Example: Calculating the Test Statistic
H0: μ1 - μ2 = 0 i.e. (μ1 = μ2) (continued)
H1: μ1 - μ2 ≠ 0 i.e. (μ1 ≠ μ2)
The test statistic is:
t=
(X1 − X 2 ) − (μ1 − μ 2 )
=
(3.27 − 2.53) − 0 = 2.040
2 1 1
1
1.5021 +
1
Sp +
n1 n 2 21 25
S =
2 (n1 − 1)S1
2
+ (n 2 − 1)S 2
2
=
(21 − 1)1.30 2 + (25 − 1)1.16 2
= 1.5021
(n1 − 1) + (n2 − 1) (21 - 1) + (25 − 1)
p
Pooled-Variance t Test Example: Hypothesis Test Solution
H0: μ1 - μ2 = 0 i.e. (μ1 = μ2) Reject H0 Reject H0
H1: μ1 - μ2 ≠ 0 i.e. (μ1 ≠ μ2)
a = 0.05 .025 .025
df = 21 + 25 - 2 = 44
Critical Values: t = ± 2.0154
-2.0154 0 2.0154 t
2.040
Test Statistic: Decision:
3.27 − 2.53 Reject H0 at a = 0.05
t= = 2.040
1 1
1.5021 + Conclusion:
21 25 There is evidence of a
difference in means.
Pooled-Variance t Test Example: Confidence Interval for µ1 - µ2
Since we rejected H0 can we be 95% confident that µNYSE >
µNASDAQ?
95% Confidence Interval for µNYSE - µNASDAQ
(X − X ) t
1 2 a/2
2
p
1 1
S + = 0.74 2.0154 0.3628 = (0.009, 1.471)
n1 n 2
Since 0 is less than the entire interval, we can be 95%
confident that µNYSE > µNASDAQ
Hypothesis tests for µ1 - µ2 with σ1 and σ2 unknown, not
assumed equal
Population means, Assumptions:
independent
▪ Samples are randomly and
samples
independently drawn
σ1 and σ2 unknown, ▪ Populations are normally
assumed equal distributed or both sample
sizes are at least 30
▪ Population variances are
σ1 and σ2 unknown, unknown and cannot be
not assumed equal * assumed to be equal
Hypothesis tests for µ1 - µ2 with σ1 and σ2 unknown and not assumed
equal (continued)
The test statistic is:
Population means,
independent
t STAT =
( X 1 )
− X 2 − ( μ1 − μ 2 )
samples
S12 S 22
+
n1 n 2
σ1 and σ2 unknown,
assumed equal tSTAT has d.f. ν =
2
S1 2 S
2
n + n
2
= 1 2
2 2
S1 2 S22
σ1 and σ2 unknown,
not assumed equal
*
n
1 +
n1 − 1
n
2
n2 − 1
Separate-Variance t Test Example
You are a financial analyst for a brokerage firm. Is there a
difference in dividend yield between stocks listed on the NYSE
& NASDAQ? You collect the following data:
NYSE NASDAQ
Number 21 25
Sample mean 3.27 2.53
Sample std dev 1.30 1.16
Assuming both populations are approximately normal
with unequal variances, is there a difference in mean
yield (a = 0.05)?
Separate-Variance t Test Example: Calculating the Test Statistic
(continued)
H0: μ1 - μ2 = 0 i.e. (μ1 = μ2)
H1: μ1 - μ2 ≠ 0 i.e. (μ1 ≠ μ2)
The test statistic is:
t=
( )
X1 − X 2 − (μ1 − μ 2 )
=
(3.27 − 2.53) − 0 = 2.019
S12 S 22 1.302 1.162
+ +
n1 n 2 21 25
2
S1 2 S 2 2
2
1.30 2
1.16 2
n +
+
n2 Use degrees of
= 2 21 25
= = 40.57
1
S1
2
S2 2 2 2 2
1.30 1.16 2 2 freedom = 40
n n 21 + 25
1 + 2
n1 − 1 n2 − 1 20 24
Separate-Variance t Test Example: Hypothesis Test Solution
H0: μ1 - μ2 = 0 i.e. (μ1 = μ2) Reject H0 Reject H0
H1: μ1 - μ2 ≠ 0 i.e. (μ1 ≠ μ2)
a = 0.05 .025 .025
df = 40
Critical Values: t = ± 2.021
-2.021 0 2.021 t
2.019
Test Statistic: Decision:
Fail To Reject H0 at a = 0.05
t = 2.019 Conclusion:
There is insufficient evidence of
a difference in means.
Related Populations: The Paired Difference Test
Tests Means of 2 Related Populations
Related • Paired or matched samples
• Repeated measures (before/after)
samples
• Use difference between paired values:
Di = X1i - X2i
• Eliminates Variation Among Subjects
• Assumptions:
• Both Populations Are Normally Distributed
• Or, if not Normal, use large samples
Related Populations (continued)
The Paired Difference Test
The ith paired difference is Di , where
Related Di = X1i - X2i
samples
n
The point estimate for the
paired difference D i
D= i =1
population mean μD is D : n
n
The sample standard (D − D)i
2
deviation is SD SD = i=1
n −1
n is the number of pairs in the paired sample
The Paired Difference Test: Finding tSTAT
• The test statistic for μD is:
Paired
samples
D − μD
t STAT =
SD
n
◼ Where tSTAT has n - 1 d.f.
The Paired Difference Test: Possible Hypotheses
Paired Samples
Lower-tail test: Upper-tail test: Two-tail test:
H0: μD 0 H0: μD ≤ 0 H0: μD = 0
H1: μD < 0 H1: μD > 0 H1: μD ≠ 0
a a a/2 a/2
-ta ta -ta/2 ta/2
Reject H0 if tSTAT < -ta Reject H0 if tSTAT > ta Reject H0 if tSTAT < -ta/2
or tSTAT > ta/2
Where tSTAT has n - 1 d.f.
The Paired Difference Confidence Interval
The confidence interval for μD is
Paired
samples
SD
D ta / 2
n
n
(D − D)
i
2
where SD = i=1
n −1
Paired Difference Test: Example
• Assume you send your salespeople to a “customer
service” training workshop. Has the training made a
difference in the number of complaints? You collect the
following data:
Number of Complaints: (2) - (1) Di
Salesperson Before (1) After (2) Difference, Di D = n
C.B. 6 4 - 2 = -4.2
T.F. 20 6 -14
M.H. 3 2 - 1
R.K. 0 0 0
SD =
(D − D)
i
2
M.O. 4 0 - 4 n −1
-21
= 5.67
Paired Difference Test: Solution
• Has the training made a difference in the number of complaints (at
the 0.01 level)?
Reject Reject
H0: μD = 0
H1: μD 0
a/2 a/2
a = .01 D = - 4.2 - 4.604 4.604
- 1.66
t0.005 = ± 4.604
d.f. = n - 1 = 4 Decision: Do not reject H0
(tstat is not in the reject region)
Test Statistic:
D − μ D − 4.2 − 0 Conclusion: There is not a
t STAT = = = −1.66 significant change in the number of
SD / n 5.67/ 5 complaints.
The Paired Difference Confidence Interval -- Example
SD
D ta / 2
The confidence interval for μD is:
n
D = -4.2, SD = 5.67
Since this interval contains 0 cannot be 99% confident that μD doesn’t = 0
5.67
99% CI for D : − 4.2 4.604
5
= (-15.87, 7.47)
Session Summary
In this session we discussed
• Comparing two independent samples
• Performed pooled-variance t test for the difference in two means
• Performed separate-variance t test for difference in two means
• Formed confidence intervals for the difference between two means
• Comparing two related samples (paired samples)
• Performed paired t test for the mean difference
• Formed confidence interval for the mean difference