Inference using difference-in-differences with clustered data requires care. Previous research ha... more Inference using difference-in-differences with clustered data requires care. Previous research has shown that t tests based on a cluster-robust variance estimator (CRVE) severely over-reject when there are few treated clusters, that different variants of the wild cluster bootstrap can over-reject or under-reject severely, and that procedures based on randomization show promise. We demonstrate that randomization inference (RI) procedures based on estimated coefficients, such as the one proposed by Conley and Taber (2011), fail whenever the treated clusters are atypical. We propose an RI procedure based on t statistics which fails only when the treated clusters are atypical and few in number. We also propose a bootstrap-based alternative to randomization inference, which mitigates the discrete nature of RI P values when the number of clusters is small.
We examine decision making in the context of one sided matching: where individuals simultaneously... more We examine decision making in the context of one sided matching: where individuals simultaneously submit several applications to vacancies, each match has an exogenous probability of forming, but each applicant can only fill one vacancy. In these environments individuals choose among interdependent, rival, uncertain outcomes. We design an experiment that has individuals choose a varying number of interdependent lotteries from a fixed set. We find that: 1) with few choices, subjects make safer and riskier choices, 2) subjects behave in a manner inconsistent with expected utility maximizing behavior. We discuss these findings in the context of college application decisions.
Many empirical projects are well suited to incorporating a linear difference-in-differences resea... more Many empirical projects are well suited to incorporating a linear difference-in-differences research design. While estimation is straightforward, reliable inference can be a challenge. Past research has not only demonstrated that estimated standard errors are biased dramatically downwards in models possessing a group clustered design, but has also suggested a number of bootstrap-based improvements to the inference procedure. In this paper, I first demonstrate using Monte Carlo experiments, that these bootstrap-based procedures and traditional cluster-robust standard errors perform poorly in situations with fewer than eleven clusters - a setting faced in many empirical applications. With few clusters, the wild cluster bootstrap-t procedure results in p-values that are not point identified. I subsequently introduce two easy-to-implement alternative procedures that involve the wild bootstrap. Further Monte Carlo simulations provide evidence that the use of a 6-point distribution with the wild bootstrap can improve the reliability of inference.
The cluster robust variance estimator (CRVE) relies on the number of clusters being large. The pr... more The cluster robust variance estimator (CRVE) relies on the number of clusters being large. The precise meaning of 'large' is ambiguous, but a shorthand 'rule of 42' has emerged in the literature. We show that this rule depends crucially on the assumption of equal-sized clusters. Monte Carlo evidence suggests that rejection frequencies at the five percent level can be more than twice the desired size when a dataset has 50 clusters proportional to the populations of the US states. In contrast, using a cluster wild bootstrap procedure for the same dataset usually results in very accurate rejection frequencies. We also show that, when the test regressor is a dummy variable, both conventional and bootstrap tests perform badly when the proportion of clusters treated is very small or very large. A third set of simulations uses placebo laws to see whether similar results hold in a difference-in-differences framework.
Inference using difference-in-differences with clustered data requires care. Previous research ha... more Inference using difference-in-differences with clustered data requires care. Previous research has shown that t tests based on a cluster-robust variance estimator (CRVE) severely over-reject when there are few treated clusters, that different variants of the wild cluster bootstrap can over-reject or under-reject severely, and that procedures based on randomization show promise. We demonstrate that randomization inference (RI) procedures based on estimated coefficients, such as the one proposed by Conley and Taber (2011), fail whenever the treated clusters are atypical. We propose an RI procedure based on t statistics which fails only when the treated clusters are atypical and few in number. We also propose a bootstrap-based alternative to randomization inference, which mitigates the discrete nature of RI P values when the number of clusters is small.
We examine decision making in the context of one sided matching: where individuals simultaneously... more We examine decision making in the context of one sided matching: where individuals simultaneously submit several applications to vacancies, each match has an exogenous probability of forming, but each applicant can only fill one vacancy. In these environments individuals choose among interdependent, rival, uncertain outcomes. We design an experiment that has individuals choose a varying number of interdependent lotteries from a fixed set. We find that: 1) with few choices, subjects make safer and riskier choices, 2) subjects behave in a manner inconsistent with expected utility maximizing behavior. We discuss these findings in the context of college application decisions.
Many empirical projects are well suited to incorporating a linear difference-in-differences resea... more Many empirical projects are well suited to incorporating a linear difference-in-differences research design. While estimation is straightforward, reliable inference can be a challenge. Past research has not only demonstrated that estimated standard errors are biased dramatically downwards in models possessing a group clustered design, but has also suggested a number of bootstrap-based improvements to the inference procedure. In this paper, I first demonstrate using Monte Carlo experiments, that these bootstrap-based procedures and traditional cluster-robust standard errors perform poorly in situations with fewer than eleven clusters - a setting faced in many empirical applications. With few clusters, the wild cluster bootstrap-t procedure results in p-values that are not point identified. I subsequently introduce two easy-to-implement alternative procedures that involve the wild bootstrap. Further Monte Carlo simulations provide evidence that the use of a 6-point distribution with the wild bootstrap can improve the reliability of inference.
The cluster robust variance estimator (CRVE) relies on the number of clusters being large. The pr... more The cluster robust variance estimator (CRVE) relies on the number of clusters being large. The precise meaning of 'large' is ambiguous, but a shorthand 'rule of 42' has emerged in the literature. We show that this rule depends crucially on the assumption of equal-sized clusters. Monte Carlo evidence suggests that rejection frequencies at the five percent level can be more than twice the desired size when a dataset has 50 clusters proportional to the populations of the US states. In contrast, using a cluster wild bootstrap procedure for the same dataset usually results in very accurate rejection frequencies. We also show that, when the test regressor is a dummy variable, both conventional and bootstrap tests perform badly when the proportion of clusters treated is very small or very large. A third set of simulations uses placebo laws to see whether similar results hold in a difference-in-differences framework.
Uploads
Papers