[go: up one dir, main page]

0% found this document useful (0 votes)
13 views8 pages

Problemset 1 2022 2

The document presents a statistical analysis of math test scores and computer access across districts, revealing a mean math score of 653.34 and a significant difference in scores between computer-intensive and non-intensive districts. It also discusses the challenges in establishing a causal link between computer access and math achievement, and provides a detailed analysis of gender wage differences, concluding that there is evidence of gender discrimination in wages. Additionally, it outlines the steps for constructing confidence intervals and hypothesis testing using Stata software.

Uploaded by

Kumar Vivek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

Problemset 1 2022 2

The document presents a statistical analysis of math test scores and computer access across districts, revealing a mean math score of 653.34 and a significant difference in scores between computer-intensive and non-intensive districts. It also discusses the challenges in establishing a causal link between computer access and math achievement, and provides a detailed analysis of gender wage differences, concluding that there is evidence of gender discrimination in wages. Additionally, it outlines the steps for constructing confidence intervals and hypothesis testing using Stata software.

Uploaded by

Kumar Vivek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Problem set 1

Basic summary statistics for math test scores and the number of computers per student, by
district, are reported below:

sum math_scr comp_stu

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
math_scr | 420 653.3426 18.7542 605.4 709.5
comp_stu | 420 .1359266 .0649558 0 .4208333

1. Using this table, report the following statistics.

a) Mean math test score (! X ) = 653.3426


2
b) Variance of math test score ( S X ) = (18.7542)2 = 351.72
S X2
c) Variance of mean math test score (! ) = 351.72 / 420 = 0.8374
d) Standard error of mean test score (! S X ) = √0.8374 = 0.9150

2. Estimate a 95% confidence interval for the mean math test score in this sample.

Confidence interval = [x̄ ± m]


Where, m = margin of error = t*(SE)
t* for 95% confidence interval = 1.984
Therefore, m = (1.984) (0.9150) = 1.81536
Finally, confidence interval = [653.3426 ± 1.81536] = [651.52, 655.15]

3. Some education analysts have speculated that access to computers can increase math
achievement. Let’s compare the average math test scores of computer-intensive districts
(defined as having a computer per student ratio above the median) with the rest of the
districts.

Computer-intensive districts

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
math_scr | 210 657.81 18.91115 605.4 709.5
_____________________________________________________________________________

All other districts

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
math_scr | 210 648.8752 17.53241 612.5 703.6

Assuming independent random samples, indicate the value of:

a. Mean difference in test scores between high-spending districts and low-spending


ones
= 657.81 - 648.87 = 8.93
b. Standard error of mean difference =
2 2
sA s
SE ( X A − X B ) = + B
NA NB
!

= √ (18.911152/ 210 + 17.532412/210)

= 1.77

4. Test the null hypothesis that the two means are the same (against the alternative that they
are different). State your conclusions in both statistical terms and policy terms.

HO : µA - µB = 0
H A: µ A - µ B ≠ 0
Degree of freedom = 210 - 1 = 209
t statistic = [(X̄ A - X̄ B) - 0] / SE
= 8.93 / 1.77
= 5.04
P value = 2P (T > t)
= 2 P (T > 5.05) = 0

Since P value is less than any significance level we choose, we reject the null hypothesis
in favour of the alternate hypothesis that the two means are different. It means, there is
enough evidence to conclude that access to computers can increase math achievement.

5. Can we conclude from these results that increasing the number of computers per student
is an effective way to increase math achievement? Explain.

No, it is difficult to establish a causal link between the number of computers per student and
math achievement. It is so because in the computer intensive districts, there may be other
factors like income of family, teacher student ratio, method of instruction etc. that may have
led to increase in math scores.
6. Suppose an NGO donates a large number of computers to 100 schools in India Using the
concept of the counterfactual, how would you define the effect of this program on math
test scores a year after the program was implemented?

It is not possible to observe the counterfactual in this case as we can’t go back in time to see
the impact of having computers and not having them on the math scores at the same time.
One way to mimic the counterfactual is to conduct a randomised controlled experiment.
Under this experiment, the math scores of 100 schools that received the computers (treatment
group) would be compared with the ones that didn’t receive the computers (control group).
However, it is important to make sure that the schools in the control and the treatment groups
are identical on all other parameters.
Part II

Using results from the Stata review in lab, the gender2009.dta dataset, and any additional
commands you need, create a “do-file” (call it ps1part2.do) to give you the information
necessary to answer the following questions. You need to show your work in answering
questions (1)-(3), i.e. write down the formulas you are using, and use the numbers from the
Stata output to compute the relevant confidence interval or test statistic. Submit your
Stata output with your answers.

1. Construct a 95% confidence interval for the proportion of men in the sample.
sum gender

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
gender | 950 .5136842 .500076 0 1

95% confidence interval = [x̄ ± m]


Where, m = margin of error = t*(SE)
t* for 95% confidence interval = 1.984
SE = √ (0.5000762 / 950) = 0.0162
Therefore, m = (1.984) (0.0162) = 0.0321
Finally, confidence interval = [0.5136842 ± 0.0321] = [0.481, 0.546]

2. Test the null hypothesis that the average wage in the population is equal to $13/hour
(against the alternative hypothesis that it is not equal to $13/hour) using a 5%
significance level. In doing so, indicate:

a. Null hypothesis

b. Alternative hypothesis

c. Test statistic used

d. p-value of this test and interpretation of the p-value.


generate wage=salary/(hours*weeks)

. sum wage
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
wage | 950 12.41145 8.906282 .1748252 81.73029

HO : µ = 13
HA: µ ≠ 13
α = 0.05
SE = √ (8.906282 2 / 950) = 0.29
t statistic = [(12.41145- 13)] / 0.29
= -2.03
P value = 2P (T > 2.03)
= 0.02 to 0.05
Since P value < α (0.05), we reject the null hypothesis that the average wage of
the population is $13/hour.

3. What is the difference in the average wage between men and women? Is this
difference statistically significant at the 5% significance level? In doing so, indicate:

a. Null hypothesis

b. Alternative hypothesis

c. Test statistic used

d. p-value of this test and interpretation of the p-value.

. sum wage if gender==1

Variable | Obs Mean Std. dev. Min Max

-------------+---------------------------------------------------------

wage | 488 14.01115 10.12239 .5 81.73029

. sum wage if gender==0

Variable | Obs Mean Std. dev. Min Max

-------------+---------------------------------------------------------

wage | 462 10.72173 7.034033 .1748252 42.89773

Difference in average wage = 14.01115 - 10.72173 = 3.29

HO : µA - µB = 0
H A: µ A - µ B ≠ 0
Now,
2 2
sA s
SE ( X A − X B ) = + B
NA NB
%

= √ (10.122392 / 488 + 7.0340332 / 462) = 0.563

t statistic = 6.98/ 0.563 = 5.84

P value (for t = 5.58) = 0 which is less than α = 0.05.


We reject the null hypothesis in favour of the alternate hypothesis. Therefore, the
difference in the average wage between men and women is statistically significant at the 5%
significance level.

4. Based on your results from question 3, would it be correct to say that there is less
than a 5% chance that the average wages of women and men are the same in the
population?

No, the result above only says that we can reject the null hypothesis. It means that there
is a less than 5% probability that the difference being observed between the average wage of
men and women are due to any error in sampling.

5. Does the result of question 3 provide evidence to conclude that there is gender
discrimination in wages, i.e. that that women earn less simply because they are
women? Explain briefly.

There is some evidence in the data to show the prevalence of gender discrimination
when it comes to wages. However, it is difficult to establish a causal link between gender and
wages from the data. For example, the difference could also be a result of the fact that
women take maternity and childcare leave. If we were to conduct an RCT, it would have
provided more conclusive results with respect to causal relationship.
Do File

cd "\\sipaxafsc\users\kv2373\Desktop\PS 1 6501"

log using Lab1_log, replace

use "gender2009"

sum gender

generate wage=salary/(hours*weeks)

sum wage if gender==1

sum wage if gender==0

Log File

--------------------------------------------------------------------------------

name: <unnamed>

log: \\sipaxafsc\users\kv2373\Desktop\PS 1 6501\Lab1_log.smcl

log type: smcl

opened on: 6 Feb 2022, 20:35:53

. use "gender2009"

end of do-file

. do "C:\Users\kv2373\AppData\Local\Temp\126\STD3890_000000.tmp"

. sum gender

Variable | Obs Mean Std. dev. Min Max

-------------+---------------------------------------------------------

gender | 950 .5136842 .500076 0 1

.
end of do-file

. do "C:\Users\kv2373\AppData\Local\Temp\126\STD3890_000000.tmp"

. generate wage=salary/(hours*weeks)

end of do-file

. do "C:\Users\kv2373\AppData\Local\Temp\126\STD3890_000000.tmp"

. sum wage if gender==1

Variable | Obs Mean Std. dev. Min Max

-------------+---------------------------------------------------------

wage | 488 14.01115 10.12239 .5 81.73029

end of do-file

. do "C:\Users\kv2373\AppData\Local\Temp\126\STD3890_000000.tmp"

. sum wage if gender==0

Variable | Obs Mean Std. dev. Min Max

-------------+---------------------------------------------------------

wage | 462 10.72173 7.034033 .1748252 42.89773

end of do-file

You might also like