What is the difference between latest event time and the earliest event time?

Get full access to Quantitative Techniques: Theory and Problems and 60K+ other titles, with free 10-day trial of O'Reilly.

There's also live online events, interactive content, certification prep materials, and more.

Get full access to Quantitative Techniques: Theory and Problems and 60K+ other titles, with free 10-day trial of O'Reilly.

There's also live online events, interactive content, certification prep materials, and more.

of two-way fixed effects DD estimates requires both a parallel trends assumption and treatment effects that are constant over time. I show how to decompose the difference between two specifications, and provide a new analysis of models that include time-varying controls.

Introduction

Difference-in-differences [DD] is both the most common and the oldest quasi-experimental research design, dating back to Snow’s [1855] analysis of a London cholera outbreak.1 A DD estimate is the difference between the change in outcomes before and after a treatment [difference one] in a treatment versus control group [difference two]: y¯TREATPOST−y¯TREATPRE−y¯CONTROLPOST−y¯CONTROLPRE. That simple quantity also equals the estimated coefficient on the interaction of a treatment group dummy and a post-treatment period dummy in the following regression: yit=γ+γi⋅TREATi+γ⋅tPOSTt+β2x2TREATi×POSTt+uit.The elegance of DD makes it clear which comparisons generate the estimate, what leads to bias, and how to test the design. The expression in terms of sample means connects the regression to potential outcomes and shows that, under a common trends assumption, a two-group/two-period [2x2] DD identifies the average treatment effect on the treated. Almost all econometrics textbooks and survey articles describe this structure,2 and recent methodological extensions build on it.3

Most DD applications diverge from this 2x2 set up though because treatments usually occur at different times.4 Local governments change policy. Jurisdictions hand down legal rulings. Natural disasters strike across seasons. Firms lay off workers. In this case researchers estimate a regression with dummies for cross-sectional units [αi⋅] and time periods [α⋅t], and a treatment dummy [Dit]: yit=αi⋅+α⋅t+βDDDit+eit.In contrast to our substantial understanding of canonical 2x2 DD, we know relatively little about the two-way fixed effects DD when treatment timing varies. We do not know precisely how it compares mean outcomes across groups.5 We typically rely on general descriptions of the identifying assumption like “interventions must be as good as random, conditional on time and group fixed effects” [Bertrand et al., 2004, p. 250]. We have limited understanding of the treatment effect parameter that regression DD identifies. Finally, we often cannot evaluate how and why alternative specifications change estimates.6

This paper shows that the two-way fixed effects DD estimator in [2] [TWFEDD] is a weighted average of all possible 2x2 DD estimators that compare timing groups to each other [the DD decomposition]. Some use units treated at a particular time as the treatment group and untreated units as the control group. Some compare units treated at two different times, using the later-treated group as a control before its treatment begins and then the earlier-treated group as a control after its treatment begins. The weights on the 2x2 DDs are proportional to timing group sizes and the variance of the treatment dummy in each pair, which is highest for units treated in the middle of the panel.

I first use this DD decomposition to show that TWFEDD estimates a variance-weighted average of treatment effect parameters sometimes with “negative weights” [Borusyak and Jaravel, 2017, de Chaisemartin and D’Haultfœuille, 2020, Sun and Abraham, 2020].7 When treatment effects do not change over time, TWFEDD yields a variance-weighted average of cross-group treatment effects and all weights are positive. Negative weights only arise when average treatment effects vary over time. The DD decomposition shows why: when already-treated units act as controls, changes in their outcomes are subtracted and these changes may include time-varying treatment effects. This does not imply a failure of the design in the sense of non-parallel trends in counterfactual outcomes, but it does suggest caution when using TWFE estimators to summarize treatment effects.

Next I use the DD decomposition to define “common trends” when one is interested in using TWFEDD to identify the variance-weighted treatment effect parameter. Each 2x2 DD relies on pairwise common trends in untreated potential outcomes so the overall assumption is an average of these terms using the variance-based decomposition weights. The extent to which a given timing group’s differential trend biases the overall estimate equals the difference between the total weight on 2x2 DDs where it is the treatment group and the total weight on 2x2 DDs where it is the control group. Because units treated near the beginning or the end of the panel have the lowest treatment variance they can get more weight as controls than treatments. In designs without untreated units they always do.

Finally, I develop simple tools to describe the TWFEDD estimator and evaluate why estimates change across specifications.8 Plotting the 2x2 DDs against their weights displays heterogeneity in the components of the weighted average and shows which terms and timing groups matter most. Summing the weights on the timing comparisons quantifies “how much” of the variation comes from timing [a common question in practice], and provides practical guidance on how well the TWFEDD estimator works compared to alternative estimators [Sun and Abraham, 2020, Borusyak and Jaravel, 2017, Callaway and Sant’Anna, 2020, Imai and Kim, 2021, Strezhnev, 2018, Ben-Michael et al., 2019]. Comparing TWFEDD estimates across specifications in a Oaxaca-Blinder-Kitagawa decomposition measures how much of the change in the overall estimate comes from the 2x2 DDs [consistent with confounding or within-group heterogeneity], the weights [changing estimand], or the interaction of the two. Scattering the 2x2 DDs or the weights from different specifications show which specific terms drive these differences. I also provide the first detailed analysis of specifications with time-varying controls, which can address bias, but also changes the sources of identification to include comparisons between units with the same treatment but different covariates.

To demonstrate these methods I replicate Stevenson and Wolfers [2006], who study of the effect of unilateral divorce laws on female suicide rates. The TWFEDD estimates suggest that unilateral divorce leads to 3 fewer suicides per million women. More than a third of the identifying variation comes from treatment timing and the rest comes from comparisons to states whose reform status does not change during the sample period. Event-study estimates show that the treatment effects grow over time, though, which biases many of the timing comparisons. The TWFEDD estimate [−3.08] is therefore a misleading summary of the average post-treatment effect [about −5]. Much of the sensitivity across specifications comes from changes in weights, or a small number of 2x2 DD’s, and need not indicate bias.

My results show how and why the TWFEDD estimator can fail to identify interpretable treatment effect parameters and suggest that practitioners should be careful when relying on it in designs with treatment timing variation. Fortunately, recent research has developed simple flexible estimators that address the problems I describe [e.g. Callaway and Sant’Anna, 2020], enabling applied researchers to make better use of variation in treatment timing.

Section snippets

The difference-in-differences decomposition theorem

When units experience treatment at different times, one cannot estimate equation [1] because the post-period dummy is not defined for control observations. Nearly all work that exploits variation in treatment timing use the two-way fixed effects regression in Eq. [2] [Cameron and Trivedi, 2005 p. 738]. Researchers clearly recognize that differences in when units received treatment contribute to identification, but have not been able to describe how these comparisons are made.9

Theory: What parameter does DD identify and under what assumptions?

Theorem 1 relates the regression DD coefficient to sample averages, which makes it simple to analyze its statistical properties by writing βˆDDin terms of potential outcomes [Holland, 1986, Rubin, 1974]. Define Yit[k]as the outcome of unit iin period twhen it is treated at ti=k, and use Yittito denote treated potential outcomes under unit i’s actual treatment date. Yit0is the untreated potential outcome. If t

Chủ Đề