PPOL561 | Accelerated Statistics for Public Policy II Week 1 Course Introduction & Research Design

# <font class = "title-panel"> PPOL561 | Accelerated Statistics for Public Policy II</font> <font size=6, face="bold"> Week 1 </font> <br> <br> <font size=100, face="bold"> Course Introduction <br> & <br> Research Design </font>
### <font class = "title-footer">  Prof. Eric Dunford  ◆  Georgetown University  ◆  McCourt School of Public Policy  ◆  <a href="mailto:eric.dunford@georgetown.edu" class="email">eric.dunford@georgetown.edu</a></font>

---

<div class="slide-footer"><span> 
PPOL561 | Accelerated Statistics for Public Policy II

&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;

Week 1

&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;

Introduction & Research

</span></div>

---

# Outline for Today

- **Course Overview**

- **Presentations**

- **Research Design**

---

# Course Overview

---

## Course Goals

![:space 3]

1. **Causality**: understand the challenge of isolating causal effects via random assignment, experiments, instrumental variables and other methods;

2. **Modeling**: understand and implement advanced statistical models, such as limited dependent variable models (binary, ordered, and multinomial outcomes), selection models, and panel models;

3. **`R` Programming**: use `R` to manipulate data and conduct advanced statistical analysis.

4. **Communication/Presentation**: effectively communicate statistical analysis and present publication-quality tables and graphics.

---
  
## Course Calendar

<table class="table" style="font-size: 17px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Week </th>
   <th style="text-align:left;"> Topic </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:left;"> Course Introduction &amp; Research Design </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2 </td>
   <td style="text-align:left;"> Data Wrangling &amp; Presentation in R </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 3 </td>
   <td style="text-align:left;"> Introduction to Causal Inference </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 4 </td>
   <td style="text-align:left;"> OLS, Confounders, &amp; Simulation </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 5 </td>
   <td style="text-align:left;"> Panel Data &amp; Difference-in-Difference </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 6 </td>
   <td style="text-align:left;"> Instrumental Variables </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 7 </td>
   <td style="text-align:left;"> Experiments </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 8 </td>
   <td style="text-align:left;"> Regression Discontinuity </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 9 </td>
   <td style="text-align:left;"> Midterm </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 10 </td>
   <td style="text-align:left;"> Matching </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 11 </td>
   <td style="text-align:left;"> Synthetic Control </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 12 </td>
   <td style="text-align:left;"> Binary Outcomes </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 13 </td>
   <td style="text-align:left;"> Ordered &amp; Multinomial Outcomes </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 14 </td>
   <td style="text-align:left;"> Selection </td>
  </tr>
</tbody>
</table>

---

## Course Policies

![:space 4]

- The wacky world of **Attendance/Participation** in the Covid Era

- **Late Assignments**

- **Proof of Diligent Debugging**

- **No Practice Exams**

- **Instructional Continuity**

- **Use of Class Materials**

---

## Work Load

+ 7 homework assignments (select 4 from 6, all do 7)
+ One 10-minute "group" presentation
+ Two 2-hour "written" exams (midterm and final)

![:space 5]

| **![:text_size 7](Assignment)** | **![:text_size 7](Percentage)** | 
|----------------|----------------|
| ![:text_size 6](Attendance)	    |	![:text_size 6](5%)        |
| ![:text_size 6](Presentation)	  |	![:text_size 6](10%)       |
| ![:text_size 6](Problem sets)	  |	![:text_size 6](25%)       |
| ![:text_size 6](Midterm exam) 	|	![:text_size 6](30%)       |
| ![:text_size 6](Final exam)   	|	![:text_size 6](30%)       |

]

---

## Succeeding in this course

- **Come Prepared**

- **Ask Questions**

- **Collaborate**

- **Start Homeworks Early**

- **Utilize the Teaching Assistant**

---

## Concerns

- **_![:text_color darkred](Different backgrounds)_**
  
  + Some have a lot of previous methods exposure; others less so.
  + We all have different substantive interests

- **_![:text_color darkred](Going too fast)_**

+ Slow me down
  + Ask questions
  + I will re-schedule the course calendar if we need time

- **_![:text_color darkred](Being called on)_**

+ It'll happen. Read, follow along with the lecture, and you'll be fine

+ **_![:text_color darkred](Presenting)_**

---

class:newsection

# Presentations

---

## General

- **_Randomly paired_** with another student; presentation date also randomly assigned.

- **_10 minute, in-class presentation with slides_** on a paper related to the material we are discussing.

- Presentations will be given at the **_start of class_** each week. 
  + Presenters should set up their slides prior to the start of class.
  + First presentation is on February 16th (and every week thereafter, except for the week of the midterm & Spring Break)

- **![:text_color darkred](Email me a copy of the presentation the _day before_ your scheduled presentation time)**

- [**Presentation Rubric**](http://ericdunford.com/ppol561/Assignments/Presentations/presentation_rubric_PPOL561.pdf)

---

## Presentation Schedule

<table class="table" style="font-size: 24px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;"> Group </th>
   <th style="text-align:center;"> Week </th>
   <th style="text-align:center;"> Partner_1 </th>
   <th style="text-align:center;"> Partner_2 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 1 </td>
   <td style="text-align:center;"> 4 </td>
   <td style="text-align:center;"> Ella Zhang </td>
   <td style="text-align:center;"> Sahithi Adari </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 2 </td>
   <td style="text-align:center;"> 5 </td>
   <td style="text-align:center;"> Fangzi Wang </td>
   <td style="text-align:center;"> Maryam Khalid Shah </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 3 </td>
   <td style="text-align:center;"> 6 </td>
   <td style="text-align:center;"> Tianhui Cao </td>
   <td style="text-align:center;"> Madeline Kinnaird </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4 </td>
   <td style="text-align:center;"> 7 </td>
   <td style="text-align:center;"> Alexander Adams </td>
   <td style="text-align:center;"> Harshini Tammareddy </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 5 </td>
   <td style="text-align:center;"> 8 </td>
   <td style="text-align:center;"> Merykokeb Belay </td>
   <td style="text-align:center;"> Lexi Gu </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 6 </td>
   <td style="text-align:center;"> 10 </td>
   <td style="text-align:center;"> Charlie Zhang </td>
   <td style="text-align:center;"> Gloria Li </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 7 </td>
   <td style="text-align:center;"> 11 </td>
   <td style="text-align:center;"> Mary Kryslette Bunyi </td>
   <td style="text-align:center;"> Matthew Ring </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 8 </td>
   <td style="text-align:center;"> 12 </td>
   <td style="text-align:center;"> Yousuf Abdelfatah </td>
   <td style="text-align:center;"> Zixun Hao </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 9 </td>
   <td style="text-align:center;"> 13 </td>
   <td style="text-align:center;"> Justine Huynh </td>
   <td style="text-align:center;"> Vince Egalla </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 10 </td>
   <td style="text-align:center;"> 14 </td>
   <td style="text-align:center;"> Ruyi Yang </td>
   <td style="text-align:center;"> Chau Nguyen </td>
  </tr>
</tbody>
</table>

---

## Main things to hit

![:space 5]

- What is the issue and why is it **interesting**?

- Why is it **hard to answer** the question?

- What are the **inferential** challenges?

- How did this paper **meet the challenge** (posed by the inferential challenges)? (Many studies don't.)

- What did they **find**?

- What do these findings **tell us** that we didn't already know?

---

![:space 7]

- **Be interesting**

+ Do **_![:text_color darkred](not)_** read notes
  
  + Don't pack too much in

+ Start strong with a “hook” and with interesting context

<br>

- **Know your audience**

+ They have advanced statistical knowledge

+ But they have not read the paper

---

![:space 7]

- **Be Concise**

+ You can't cover everything so don't try
  
  + Do not show huge tables of numbers &rarr; in fact, don't copy and paste any tables into your slide.
  
  + Generate compelling visuals.

+ If something is on a slide, explain it clearly.  (Otherwise cut it!)

- **Be Ready**

+ Practice out-loud at least twice (preferably to someone who hasn’t read the paper)
  
  + 10 minutes – not more, not less
  
  + Prepare for questions

---

## Critiques require context

An effective critique includes an explanation of how your point would change results

<br>

- **![:text_color darkred](Bad)**: 
> There are too many variables!

- **![:text_color darkgreen](Better)**: 
> Multicollinearity of variable `\(X_1\)` and `\(X_2\)` could increase the variance of `\(\beta_1\)` and could explain its statistical insignificance

---

## Critiques require context

An effective critique includes an explanation of how your point would change results

<br>

- **![:text_color darkred](Bad)**:
> They didn’t include a variable for `\(Z\)`

- **![:text_color darkgreen](Better)**:
> `\(Z\)` was omitted and we believe that excluding it matters as `\(Z\)` is likely correlated with variable `\(X\)`, making us wonder if omitted variable bias has distorted the estimate of the effect of `\(X\)` on `\(Y\)`.

---

## Critiques require context

An effective critique includes an explanation of how your point would change results

- **![:text_color darkred](Bad)**:

> The study was only of California.

- **![:text_color darkgreen](Better, but still bad)**:
> The study was only of California, which is a very diverse state.

- **![:text_color darkgreen](Better)**:
> I’m concerned about the generalizability of the findings because I expect the coefficient on X to be different in a diverse state such as California due to …

---

## Critiques require context

![:space 7]
- Better to have 1 or 2 critique well explained than many poorly explained

- It's okay to piggy-back on some of the critiques addressed in subsequent publications; but you need to have some original content.

- Be clear and explicit about why you think your proposed adjustment or issue pointed out might change the outcome of the analysis.

- Be fair

+ Avoid "collect more data"-type critiques.

---

## Tips

- **_Start preparing early_**

- **_Coordinate_** & **_communicate_** with your partner

- Hunt for an **_interesting paper_** (makes the exercise far more enjoyable)

+ Have a paper you already think is really interesting? Look at other papers in that journal. Georgetown subscribes to hundreds of academic journals.
  
  + Check out media/non-academic publications for features on academic work, such as the [Monkey Cage](https://www.washingtonpost.com/news/monkey-cage/). These articles will direct you to political science/public policy related work. 
  
- Keep in mind the paper must be a **_peer-reviewed_**, empirical article; ideally using one or more of the methods covered in this class

---

# Research Design

---

![:space 5]

.pull-left[
<br>
<br>
- **Inference**: a belief based on evidence _and_ rules for processing that evidence

<br>

- **Methodology**: tools for gathering and analyzing data to make valid inferences
]

<div id="htmlwidget-769156fa5a081b1ddaab" style="width:504px;height:650px;" class="DiagrammeR html-widget"></div>
<script type="application/json" data-for="htmlwidget-769156fa5a081b1ddaab">{"x":{"diagram":"\ngraph TB\n  A(Questions)-->B{Data}\n  B-->C(Methodology)\n  C-->D[Beliefs]\n  D-->E[Claims]\n  E-->A\n  D-->A\n"},"evals":[],"jsHooks":[]}</script>
]

---

###Two Types of Inference

![:space 5]

**Descriptive Inference** &rarr; What are the facts?

+ Is the climate changing?
+ Is the United States politically polarized? 
+ Is global terrorism increasing?
+ Is Azerbaijan a democracy?

**Causal Inference** &rarr; Why does something occur?

- _Why_ is the climate changing?
- _Why_ is the United States politically polarized?
- _Why_ is (or is not) global terrorism increasing?
- _Why_ is (or is not) Azerbaijan a democracy?

---

### A Causal Research Question

Typically start with either

- (1) an **outcome** (dependent variable)

- If what, then `\(Y\)`?
  - What _causes_ `\(Y\)`?
  - Associated with a search for causes, e.g. what causes climate change?

- (2) a **cause** (independent variable)

- If `\(X\)`, then what? 
  - What happens to `\(Y\)` if `\(X\)` goes up or down?
  - Associated with “experimentation”; e.g. What happens if we release greenhouse gases into the air?
  
  
---

### Which of these is a causal research question?

![:space 10]

- Will Democrats win the next U.S. presidential election?

- What factors increased the likelihood of Hillary Clinton's defeat in the 2016 election?

- How has electoral performance for the Republican party changed over the last three decades?

- What was the result of the last U.S. presidential election?

- What role did the economy have on the last U.S. presidential election?

---

### Which of these is a causal research question?

![:space 10]

- ![:text_color lightgrey](Will Democrats win the next UK general election?)

- What factors increased the likelihood of Hillary Clinton's defeat in the 2016 election?

- ![:text_color lightgrey](How has electoral performance for the Republican party changed over the last three decades?)

- ![:text_color lightgrey](What was the result of the last U.S. presidential election?)

- What role did the economy have on the last U.S. presidential election?

---

### Falsifiability

One of the tenets behind the scientific method is that any scientific hypothesis and resultant experimental design must be inherently _falsifiable_.

**Falsifiability** is the assertion that for any hypothesis to have credence, it must be inherently <u>disprovable</u> before it can become accepted as a scientific hypothesis or theory.

---

### Falsifiability

One of the tenets behind the scientific method is that any scientific hypothesis and resultant experimental design must be inherently _falsifiable_.

**Falsifiability** is the assertion that for any hypothesis to have credence, it must be inherently <u>disprovable</u> before it can become accepted as a scientific hypothesis or theory.

.center[**![:text_color darkred](Unfalsifiable)**]
> ![:text_color darkred](Social welfare spending by the government sometimes impacts poverty in a country.)

.center[**![:text_color darkgreen](Falsifiable)**]
> ![:text_color darkgreen](_Increased levels_ of social welfare spending by the government _reduces_ the average level of poverty in the country.)

---

# Good research questions

### 1. Start from political problem or puzzle

- Something important

- Not obvious

### 2. Builds on an existing research literature/Knowledge

- "Stand on the shoulders of giants"

- Don't reinvent the wheel

### 3. Falsifiable

---

## Generating Questions

### Puzzle-driven

- Given what we know, we expect A but observe B. How interesting!

### Theory-driven

- Deduction

### Data/Method-driven

- Induction

### "Should"-driven

- normative

---

## Generating Theories

![:space 10]

![:center_img](Figures/theory_analysis.png)

---

## Generating Theories

### Approach 1: Reason _Inductively_

- Induction works by drawing generalities from specific observations

- Sometimes called “bottom-up” theorizing

<br>

### Approach 2: Reason _Deductively_

- Deduction begins from general, assumed principles/axioms to reach more specific observable realities

- e.g. Rational choice theory

---

## Generating Theories

What makes for a **_good theory_**:

- **Truth**: Maps to reality

- **Relevance**: Matters

- **Coherence**: Is clear

- **Falsifiability**: Can be disproven

- **Precision**: Captures the concepts of interest

- **Generality**: Theories that can explain more are preferred over theories that can explain less

- **Parsimony**: Simple theories are preferred over complex theories

---

## Generating Hypotheses

- What would evidence consistent/inconsistent with the theory be?

- Think about **_observable_ implications** derived from the theory

- Use causal ("if-then") logic

- **_Falsifiable_**

- Avoid **_Observational Equivalence_**
  
  - All hypotheses for two (or more) theories are identical
  
  - Different drivers yield the same results
  
  - Get around this by establishing clear **_scope conditions_**; reformulating question/theory/concept

---

### (_Social_) Scientific Method

![:space 2]

**Research question(s)**

**Clarify the core concepts**

**Develop (causal) theory (_with clear scope conditions_)**

**Derive speciﬁc, testable hypotheses**

**Plan data collection**

**Gather data/evidence**

**Analyze data**

**Draw inferences**

]