class: center, middle, inverse, title-slide #
PPOL561 | Accelerated Statistics for Public Policy II
Week 8
Regression Discontinuity
###
Prof. Eric Dunford ◆ Georgetown University ◆ McCourt School of Public Policy ◆
eric.dunford@georgetown.edu
--- layout: true <div class="slide-footer"><span> PPOL561 | Accelerated Statistics for Public Policy II           Week 8 <!-- Week of the Footer Here -->              Regression Discontinuity <!-- Title of the lecture here --> </span></div> --- class: outline # Outline for Today ![:space 10] - **Basic Regression Discontinuity model** - **More flexible Regression Discontinuity models** - **Windows and bins** - **Limitations and diagnostics** --- class: newsection # Regression Discontinuity --- ![:space 25] `$$Y_i = \beta_0 + \beta_1T_i + \epsilon_i$$` ![:space 5] - `\(T_i\)` is the **Treatment** variable. We seek to understand the effect of the treatment. - Concern is that allocation of the treatment `\(T_i\)` is correlated with unobserved features in `\(\epsilon_i\)` --- ![:space 10] We've learned a number of solutions for overcoming **_endogeneity_** ![:space 5] ![:text_color steelblue](Controls, Fixed effects, & DiD) <br> ![:text_color steelblue](Instrumental Variables) <br> ![:text_color steelblue](Experiments) --- ![:space 10] We've learned a number of solutions for overcoming **_endogeneity_** ![:space 5] ![:text_color steelblue](Controls, Fixed effects, & DiD) → ![:text_color orangered](Cannot guarantee exogeneity) <br> ![:text_color steelblue](Instrumental Variables) → ![:text_color orangered](Difficult to locate a good instrument) <br> ![:text_color steelblue](Experiments) → ![:text_color orangered](Expensive, infeasible and/or immoral [at times]) --- ### Regression Discontinuity (RD) ![:space 5] - Treatment ( `\(T_i\)` ) is assigned according to a **rule** -- - An **assignment variable** determines whether someone receives the treatment. - People with values above some **_cutoff_** receive the treatment; - People with values of the assignment variable below the **_cutoff_** do not receive the treatment. -- - Any bump up/down in the dependent variable around the cutoff will **_reflect the causal effect of the treatment_**. --- ### Regression Discontinuity (RD) ![:space 5] `\(C\)` is the value of the **_cutoff_** ![:space 5] `$$Y_i = \beta_0 + \beta_1T_i + \beta_2 (X_{i} - C) + \epsilon_i$$` ![:space 5] `$$T_i = 1~\text{if}~ X_{i} \ge C$$` `$$T_i = 0~\text{if}~X_{i} < C$$` --- ![:space 15] <img src="week08-regression-discontinuity-ppol561_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> --- ### Example ![:space 5] Examine the effect of drinking on academic performance (grades) at a school where the drinking age is rigorously enforced. ![:space 5] `$$Y_i = \beta_0 + \beta_1T_i + \beta_2 (X_{i} - C) + \epsilon_i$$` ![:space 5] `$$Grades_i = \beta_0 + \beta_1Legal_i + \beta_2 (Age_{i} - 21) + \epsilon_i$$` --- ![:space 5] <img src="week08-regression-discontinuity-ppol561_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- ### How is this helpful? ![:space 3] - Usually `\(X\)` ("Age") is correlated with `\(Y\)` ("Grades"), so we can't simply compare treated and control. - But students near the cutoff ( `\(X = C\)` ) might be similar -- ![:space 1] **Key Idea** > Use discontinuity in `\(E[Y|X]\)` at the cutoff value `\(X = C\)` to estimate the effect ot `\(T\)` on `\(Y\)` for units near `\(X = C\)` -- ![:space 1] As long as there is no discontinuity in relationship between error (omitted variables) and the outcome at the discontinuity, then we can attribute any bump in the dependent variable as the effect of the treatment. --- `$$Y_i = \beta_0 + \beta_1T_i + \beta_2 (X_{i} - C) + \epsilon_i$$` <img src="week08-regression-discontinuity-ppol561_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- ![:space 10] ![:center_img](Figures/different-RD.png) --- ### Assumptions ![:space 5] - The key assumption for RD to work is that the error term itself does not jump at the point of the discontinuity. -- - Even if the error term is correlated with the assignment variable, the estimated effect of the treatment is still valid. -- - Model error as a function of assignment variable `$$\epsilon_i = \rho X_{i} + \nu_i$$` - _Example_: in grades and alcohol model, consider something in error term (such as maturity) as a function of assignment variable (in this case age) --- ### Assumptions If we estimate model with only treatment variable, we will have endogeneity (because Treatment (able to drink legally) is correlated with X (age) and X is correlated with error) `$$Y_i = \beta_0 + \beta_1T_i + \epsilon_i$$` -- <br> If include the assignment variable (let's assume `\(C = 0\)` for simplicity) and use the fact that `\(\epsilon_i = \rho X_{i} + \nu_i\)` <br> `$$Y_i = \beta_0 + \beta_1T_i + \beta_2X_{i} + \epsilon_i$$` `$$Y_i = \beta_0 + \beta_1T_i + \beta_2X_{i} + \rho X_{i} + \nu_i$$` `$$Y_i = \beta_0 + \beta_1T_i + (\beta_2 + \rho)X_{i} + \nu_i$$` --- ### Assumptions `$$Y_i = \beta_0 + \beta_1T_i + (\beta_2 + \rho)X_{i} + \nu_i$$` ![:space 3] This means: - The **_treatment variable is uncorrelated with the error term_**, even though the assignment variable is correlated with the error term. - OLS will provide an **_unbiased estimate_** of `\(\beta_1\)` <u> as long as there is no "jump" in the error term at the cutoff.</u> (i.e. there is some other factor driving the discontinuity.) - Given that the assignment variable is correlated with the error, so it'll be biased. This is okay because our main interest lies in correctly estimating `\(\beta_1\)` --- ### Questions ![:space 10] Many school districts pay for new school buildings with bond issues that must be approved by voters. Supporters of these bond issues typically argue that new buildings improve schools and thereby boost housing values. Cellini, Ferreira, and Rothstein (2010) used RD to test whether passage of school bonds caused housing values to rise. - **(A)** What is the assignment variable? - **(B)** Explain how to use a basic RD approach to estimate the effect of school bond passage on housing values? --- ### Questions ![:space 10] U.S. citizens are eligible for Medicare the day they turn 65 years old. Many believe that people with health insurance are less likely to die prematurely because they will be more likely to seek treatment and doctors will be more willing to conduct tests and procedures for them. Card, Dobkin, and Maestas (2009) used RD to address this question. - **(A)** What is the assignment variable? - **(B)** Explain how to use a basic RD approach to estimate the effect of Medicare coverage on the probability of dying prematurely. --- class: newsection ### More Flexible <br> RD Models --- ![:space 10] `$$Y_i = \beta_0 + \beta_1T_i + \beta_2 (X_{i} - C) + \epsilon_i$$` ![:space 5] .pull-left[ <img src="week08-regression-discontinuity-ppol561_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> ] .pull-right[ <br> In basic version of RD, the relationship between `\(X\)` and `\(Y\)` is - **_linear_**, and - the **_same on both sides of the treatment_** in this model ] --- `$$Y_i = \beta_0 + \beta_1T_i + \beta_2 (X_{i} - C) + \beta_3 (X_{i} - C) T_i+ \epsilon_i$$` <img src="week08-regression-discontinuity-ppol561_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- ### Varying Slopes Model `$$Y_i = \beta_0 + \beta_1T_i + \beta_2 (X_{i} - C) + \beta_3 (X_{i} - C) T_i+ \epsilon_i$$` `$$T_i = 1~\text{if}~ X_{i} \ge C$$` `$$T_i = 0~\text{if}~X_{i} < C$$` ![:space 3] - Interation between `\(T_i\)` and the assignment variable - `\(\beta_3\)` captures how different the slope is for observations where `\(X_1\)` is greater than `\(C\)` - By using `\(X_{1i} - C\)` instead of `\(X_{1i}\)` for the assignment variable, we ensure that `\(\hat{\beta_1}\)` indicates the difference between the treated and control when `\(X_{1i} - C = 0\)` --- <br> <img src="week08-regression-discontinuity-ppol561_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- ### Polynomial Models ![:space .5] `$$Y_i = \beta_0 + \beta_1T_i + \beta_2 (X_{i} - C) + \beta_3 (X_{i} - C)^2 + \beta_4 (X_{i} - C)^3 +\\ \beta_5 (X_{i} - C)T_i+ \beta_6 (X_{i} - C)^2T_i + \beta_7 (X_{i} - C)^3T_i + \epsilon_i$$` `$$T_i = 1~\text{if}~ X_{i} \ge C$$` `$$T_i = 0~\text{if}~X_{i} < C$$` ![:space .5] - The goal is to find a functional form for the relashionship between `\(X_{1i} - C\)` and `\(Y\)` that best models the data on either side of the treatment. - Aim is to ensure that any bump at the cutoff reflects only the causal effect of the treatment. - Rarely have a "theory" for these functional sepecifications and that's okay (again all we care about is `\(\beta_1\)`) --- <img src="week08-regression-discontinuity-ppol561_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- class: newsection ### Windows and Bins --- ### Window Size ![:space 5] - The **_window_** (or "bandwidth") is the range of `\(X\)` to which we limit our analysis. - In theory, we want to compare values right next to the boundary (on each side). - In practice, always need fairly big windows in order to get sufficient sample size (i.e. concerns over power) - Reducing window size is a great way to deal with non-linearities. - With smaller windows, linear models should do better and better. --- <img src="week08-regression-discontinuity-ppol561_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- ### Binned Graphs Often it can be difficult to observe a discontinuity. <img src="week08-regression-discontinuity-ppol561_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- ### Binned Graphs We can bin observations to see the average effect. <img src="week08-regression-discontinuity-ppol561_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- ### Virtues of Bin Graphs ![:space 3] - **To construct a bin plot**, - divide the `\(X\)` variable into multiple regions (or "bins") above and below the cutoff; - calculate the average value of `\(Y\)` within each of those regions. -- - If there is an effect, will almost always be **_able to “see” the discontinuity_** it with a simple graph -- - **_Diagnostics_**: assess if there are _non-linearities_ or _discontinuities_ at other points. -- - **![:text_color darkred](Remember:)** always run analysis on _unbinned_ data. The bins are solely for visualization purposes. --- ### In-Practice ```r dat ``` ``` ## # A tibble: 1,000 x 3 ## y x t ## <dbl> <dbl> <dbl> ## 1 3.00 0.586 1 ## 2 1.37 0.709 1 ## 3 0.122 -0.109 0 ## 4 0.131 -0.453 0 ## 5 0.937 0.606 1 ## 6 -1.26 -1.82 0 ## 7 1.24 0.630 1 ## 8 -0.0861 -0.276 0 ## 9 1.61 -0.284 0 ## 10 0.455 -0.919 0 ## # … with 990 more rows ``` --- ### In-Practice ```r dat %>% mutate(bin = cut_interval(x, 25)) ``` ``` ## # A tibble: 1,000 x 4 ## y x t bin ## <dbl> <dbl> <dbl> <fct> ## 1 3.00 0.586 1 (0.398,0.643] ## 2 1.37 0.709 1 (0.643,0.887] ## 3 0.122 -0.109 0 (-0.335,-0.0903] ## 4 0.131 -0.453 0 (-0.579,-0.335] ## 5 0.937 0.606 1 (0.398,0.643] ## 6 -1.26 -1.82 0 (-2.05,-1.8] ## 7 1.24 0.630 1 (0.398,0.643] ## 8 -0.0861 -0.276 0 (-0.335,-0.0903] ## 9 1.61 -0.284 0 (-0.335,-0.0903] ## 10 0.455 -0.919 0 (-1.07,-0.823] ## # … with 990 more rows ``` --- ### In-Practice ```r dat %>% mutate(bin = cut_interval(x, 25)) %>% group_by(bin) %>% summarize(y = mean(y), x = mean(x)) ``` ``` ## # A tibble: 24 x 3 ## bin y x ## <fct> <dbl> <dbl> ## 1 [-2.78,-2.53] 0.331 -2.62 ## 2 (-2.53,-2.29] -0.508 -2.38 ## 3 (-2.29,-2.05] 0.480 -2.16 ## 4 (-2.05,-1.8] 0.435 -1.89 ## 5 (-1.8,-1.56] 0.671 -1.66 ## 6 (-1.56,-1.31] 0.405 -1.42 ## 7 (-1.31,-1.07] 0.755 -1.19 ## 8 (-1.07,-0.823] 0.850 -0.952 ## 9 (-0.823,-0.579] 0.721 -0.700 ## 10 (-0.579,-0.335] 0.982 -0.462 ## # … with 14 more rows ``` --- class: newsection ### Limitations <br> & <br> Diagnostics --- ### Limitations of RD - Recall, the key assumption of RD is that there is no “bump” (discontinuity) in the error term at the discontinuity - This can **![:text_color darkred](fail)** when: -- - ![:text_color darkred](Individuals have _control over assignment_ variable) -- + _Example_: suppose a test score above “100” allows you to get a scholarship and that students are allowed to re-take test. More motivated students (where motivation is in error term) may re-take the test until they get a 100. -- + _Example_: financial aid officer may score financial need knowing applicant and score needed to get aid --- ### Limitations of RD - Recall, the key assumption of RD is that there is no “bump” (discontinuity) in the error term at the discontinuity - This can **![:text_color darkred](fail)** when: - ![:text_color darkred](Individuals have _control over assignment_ variable) - ![:text_color darkred](People may anticipate discontinuity) -- + _Example_: Suppose people save up visits to doctor until after they are 65 (for the medicare to kick in) --- ### Limitations of RD - Recall, the key assumption of RD is that there is no “bump” (discontinuity) in the error term at the discontinuity - This can **![:text_color darkred](fail)** when: - ![:text_color darkred](Individuals have _control over assignment_ variable) - ![:text_color darkred](People may anticipate discontinuity) - **Generalizability** -- - Estimating the effect of the treament on the _subpopulation_ where `\(X_i = C\)` - Also known as **_"Local Average Treatment Effect"_** (LATE) -- - Treatment effect may no generalize to the broader population in the same (i.e. for those `\(X_i\)`s further away from `\(C\)`) --- ### Diagnostics: _Clustering_ - Assess whether there is clustering on one side of the cutoff - Can use a histogram to assess. - There should be no bump in assignment variable. -- ![:center_img](Figures/diagnostic_clustering.png) --- ### Diagnostics: _Isolated Discontinuity_ - Assess if other variables "act weird" at the discontinuity. - At discontinuity, there should be **_no discontinuity in other covariates_**. -- - For some "other" covariate `\(W\)`: `$$W_i = \gamma_0 + \gamma_1 T_i + \gamma_2(X_{i} – C) + \epsilon_i$$` `$$T_i = 1~\text{if}~ X_{i} \ge C$$` `$$T_i = 0~\text{if}~X_{i} < C$$` `\(\gamma_1\)` should equal zero. -- - If some other variable jumps at the discontinuity, we may wonder if people are somehow self-selecting (or being selected) based on unknown additional factors. --- ### Diagnostics: _Placebo Test_ ![:space 7] - **Dependent variable shouldn’t “jump” at other cutoffs** -- <br> - Conduct **placebo tests** at the discontinuity - Set different cutoff points - See if there is another discontinuity -- <br> - For example, in the drinking and grades study, see if there are any observed effects when we alter the age from 21 to 25 to 18, etc. --- ### Example: Alcohol and Drinking ![:center_img 70](Figures/drinking-test-scores.png) --- ### Example: Alcohol and Drinking ![:center_img 90](Figures/drinking-hist-diagonistic.png) --- ### Example: Alcohol and Drinking ![:space 20] ![:center_img 100](Figures/drinking-covar-diagonistic.png) --- ### Example: Alcohol and Drinking ![:space 20] ![:center_img 100](Figures/drinking-rd-models.png) --- ### RD Steps ![:space 5] - **Step 1**: Assess appropriateness of RD - Basic RD (assignment variable perfectly predicts treatment) - Qualitatively assess whether agents have control over assignment variable - Conduct balancing tests -- - **Step 2**: Plot data – typically with binned graph -- - **Step 3**: Estimate linear model using different window sizes -- - **Step 4**: Estimate non-linear models using different window sizes