Problem set 1
Basic summary statistics for math test scores and the number of computers per student, by
district, are reported below:
sum math_scr comp_stu
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
math_scr | 420 653.3426 18.7542 605.4 709.5
comp_stu | 420 .1359266 .0649558 0 .4208333
1. Using this table, report the following statistics.
a) Mean math test score (! X ) = 653.3426
2
b) Variance of math test score ( S X ) = (18.7542)2 = 351.72
S X2
c) Variance of mean math test score (! ) = 351.72 / 420 = 0.8374
d) Standard error of mean test score (! S X ) = √0.8374 = 0.9150
2. Estimate a 95% confidence interval for the mean math test score in this sample.
Confidence interval = [x̄ ± m]
Where, m = margin of error = t*(SE)
t* for 95% confidence interval = 1.984
Therefore, m = (1.984) (0.9150) = 1.81536
Finally, confidence interval = [653.3426 ± 1.81536] = [651.52, 655.15]
3. Some education analysts have speculated that access to computers can increase math
achievement. Let’s compare the average math test scores of computer-intensive districts
(defined as having a computer per student ratio above the median) with the rest of the
districts.
Computer-intensive districts
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
math_scr | 210 657.81 18.91115 605.4 709.5
_____________________________________________________________________________
All other districts
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
math_scr | 210 648.8752 17.53241 612.5 703.6
Assuming independent random samples, indicate the value of:
a. Mean difference in test scores between high-spending districts and low-spending
ones
= 657.81 - 648.87 = 8.93
b. Standard error of mean difference =
2 2
sA s
SE ( X A − X B ) = + B
NA NB
!
= √ (18.911152/ 210 + 17.532412/210)
= 1.77
4. Test the null hypothesis that the two means are the same (against the alternative that they
are different). State your conclusions in both statistical terms and policy terms.
HO : µA - µB = 0
H A: µ A - µ B ≠ 0
Degree of freedom = 210 - 1 = 209
t statistic = [(X̄ A - X̄ B) - 0] / SE
= 8.93 / 1.77
= 5.04
P value = 2P (T > t)
= 2 P (T > 5.05) = 0
Since P value is less than any significance level we choose, we reject the null hypothesis
in favour of the alternate hypothesis that the two means are different. It means, there is
enough evidence to conclude that access to computers can increase math achievement.
5. Can we conclude from these results that increasing the number of computers per student
is an effective way to increase math achievement? Explain.
No, it is difficult to establish a causal link between the number of computers per student and
math achievement. It is so because in the computer intensive districts, there may be other
factors like income of family, teacher student ratio, method of instruction etc. that may have
led to increase in math scores.
6. Suppose an NGO donates a large number of computers to 100 schools in India Using the
concept of the counterfactual, how would you define the effect of this program on math
test scores a year after the program was implemented?
It is not possible to observe the counterfactual in this case as we can’t go back in time to see
the impact of having computers and not having them on the math scores at the same time.
One way to mimic the counterfactual is to conduct a randomised controlled experiment.
Under this experiment, the math scores of 100 schools that received the computers (treatment
group) would be compared with the ones that didn’t receive the computers (control group).
However, it is important to make sure that the schools in the control and the treatment groups
are identical on all other parameters.
Part II
Using results from the Stata review in lab, the gender2009.dta dataset, and any additional
commands you need, create a “do-file” (call it ps1part2.do) to give you the information
necessary to answer the following questions. You need to show your work in answering
questions (1)-(3), i.e. write down the formulas you are using, and use the numbers from the
Stata output to compute the relevant confidence interval or test statistic. Submit your
Stata output with your answers.
1. Construct a 95% confidence interval for the proportion of men in the sample.
sum gender
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
gender | 950 .5136842 .500076 0 1
95% confidence interval = [x̄ ± m]
Where, m = margin of error = t*(SE)
t* for 95% confidence interval = 1.984
SE = √ (0.5000762 / 950) = 0.0162
Therefore, m = (1.984) (0.0162) = 0.0321
Finally, confidence interval = [0.5136842 ± 0.0321] = [0.481, 0.546]
2. Test the null hypothesis that the average wage in the population is equal to $13/hour
(against the alternative hypothesis that it is not equal to $13/hour) using a 5%
significance level. In doing so, indicate:
a. Null hypothesis
b. Alternative hypothesis
c. Test statistic used
d. p-value of this test and interpretation of the p-value.
generate wage=salary/(hours*weeks)
. sum wage
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
wage | 950 12.41145 8.906282 .1748252 81.73029
HO : µ = 13
HA: µ ≠ 13
α = 0.05
SE = √ (8.906282 2 / 950) = 0.29
t statistic = [(12.41145- 13)] / 0.29
= -2.03
P value = 2P (T > 2.03)
= 0.02 to 0.05
Since P value < α (0.05), we reject the null hypothesis that the average wage of
the population is $13/hour.
3. What is the difference in the average wage between men and women? Is this
difference statistically significant at the 5% significance level? In doing so, indicate:
a. Null hypothesis
b. Alternative hypothesis
c. Test statistic used
d. p-value of this test and interpretation of the p-value.
. sum wage if gender==1
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
wage | 488 14.01115 10.12239 .5 81.73029
. sum wage if gender==0
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
wage | 462 10.72173 7.034033 .1748252 42.89773
Difference in average wage = 14.01115 - 10.72173 = 3.29
HO : µA - µB = 0
H A: µ A - µ B ≠ 0
Now,
2 2
sA s
SE ( X A − X B ) = + B
NA NB
%
= √ (10.122392 / 488 + 7.0340332 / 462) = 0.563
t statistic = 6.98/ 0.563 = 5.84
P value (for t = 5.58) = 0 which is less than α = 0.05.
We reject the null hypothesis in favour of the alternate hypothesis. Therefore, the
difference in the average wage between men and women is statistically significant at the 5%
significance level.
4. Based on your results from question 3, would it be correct to say that there is less
than a 5% chance that the average wages of women and men are the same in the
population?
No, the result above only says that we can reject the null hypothesis. It means that there
is a less than 5% probability that the difference being observed between the average wage of
men and women are due to any error in sampling.
5. Does the result of question 3 provide evidence to conclude that there is gender
discrimination in wages, i.e. that that women earn less simply because they are
women? Explain briefly.
There is some evidence in the data to show the prevalence of gender discrimination
when it comes to wages. However, it is difficult to establish a causal link between gender and
wages from the data. For example, the difference could also be a result of the fact that
women take maternity and childcare leave. If we were to conduct an RCT, it would have
provided more conclusive results with respect to causal relationship.
Do File
cd "\\sipaxafsc\users\kv2373\Desktop\PS 1 6501"
log using Lab1_log, replace
use "gender2009"
sum gender
generate wage=salary/(hours*weeks)
sum wage if gender==1
sum wage if gender==0
Log File
--------------------------------------------------------------------------------
name: <unnamed>
log: \\sipaxafsc\users\kv2373\Desktop\PS 1 6501\Lab1_log.smcl
log type: smcl
opened on: 6 Feb 2022, 20:35:53
. use "gender2009"
end of do-file
. do "C:\Users\kv2373\AppData\Local\Temp\126\STD3890_000000.tmp"
. sum gender
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
gender | 950 .5136842 .500076 0 1
.
end of do-file
. do "C:\Users\kv2373\AppData\Local\Temp\126\STD3890_000000.tmp"
. generate wage=salary/(hours*weeks)
end of do-file
. do "C:\Users\kv2373\AppData\Local\Temp\126\STD3890_000000.tmp"
. sum wage if gender==1
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
wage | 488 14.01115 10.12239 .5 81.73029
end of do-file
. do "C:\Users\kv2373\AppData\Local\Temp\126\STD3890_000000.tmp"
. sum wage if gender==0
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
wage | 462 10.72173 7.034033 .1748252 42.89773
end of do-file