PPOL561 | Accelerated Statistics for Public Policy II Week 12 Generalized Linear Models

# <font class = "title-panel"> PPOL561 | Accelerated Statistics for Public Policy II</font> <font size=6, face="bold"> Week 12 </font> <br> <br> <font size=100, face="bold"> Generalized Linear Models </font>
### <font class = "title-footer">  Prof. Eric Dunford  ◆  Georgetown University  ◆  McCourt School of Public Policy  ◆  <a href="mailto:eric.dunford@georgetown.edu" class="email">eric.dunford@georgetown.edu</a></font>

---

<div class="slide-footer"><span> 
PPOL561 | Accelerated Statistics for Public Policy II

&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;

Week 12

&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;

GLMs

</span></div>

---
class: outline

# Outline for Today

<br>

- **_Linear Probability Models_** and their discontents

<br>

- **_Maximum Likelihood Estimation_** for binary responses

<br>

- It's all about the **_Substantive Effects_** and generating estimates of **_Uncertainty_** around substantive predictions.

---

# Limited dependent variables

![:space 5]

- **Binary** – vote or not; go to war or not

- **Multinomial** – voted for Clinton, Trump, Johnson, or Stein

- **Ordinal** – strongly oppose war, somewhat oppose war, somewhat support war, strongly support war

- **Counts** – number of bills passed; number of conflicts; number of organizations joined; number of puppies hugged

- **Duration** – time to adopt full suffrage; time to bill passage; time to end of regime

---

# Binary Dependent Variables

---

## The problem

When we have a binary outcome (0/1), we can use OLS to estimate the relationship. This is known as a **linear probability model** (LPM).

We start with the usual equation:

`$$y_i = \beta_0 + \beta_1 x_i + \epsilon_i$$`
Here `$y \in [0,1]$`
   
- The probability of "success": `$pr(y_i = 1) = p_i$`
- The probability of "failure": `$pr(y_i = 0) = 1 - p_i$`

`$y$` follows a **Bernoulli** distribution

$$ E(y_i) = 0 \times (1-p_i) + 1 \times p_i = p_i$$
$$ E(y_i | x_i) = \beta_0 + \beta_1 x_i = p_i $$

---

## The problem

.center[
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;"> 0.690 </td>
   <td style="text-align:right;"> 0.034 </td>
   <td style="text-align:right;"> 20.222 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> x </td>
   <td style="text-align:right;"> 0.328 </td>
   <td style="text-align:right;"> 0.037 </td>
   <td style="text-align:right;"> 8.756 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
</tbody>
</table>
]

---

## The problem

.center[
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:left;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:left;"> 69% </td>
   <td style="text-align:right;"> 0.034 </td>
   <td style="text-align:right;"> 20.222 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> x </td>
   <td style="text-align:left;"> 32.8% </td>
   <td style="text-align:right;"> 0.037 </td>
   <td style="text-align:right;"> 8.756 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
</tbody>
</table>
]

---

## The problem

- OLS may produce **non-sense predictions** (recall that OLS assumes continuous interval level data)

+ But we can **recode**
  
<img src="week-12_ppol561_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" />

---

## The problem

- OLS may produce **non-sense predictions** (recall that OLS assumes continuous interval level data)

- Disturbances are <u>_not_</u> normally distributed; they follow the Bernoulli distribution.

---

## The problem

- OLS may produce **non-sense predictions** (recall that OLS assumes continuous interval level data)

- Disturbances are <u>_not_</u> normally distributed, they follow the Bernoulli distribution.

---

## The problem

- OLS may produce **non-sense predictions** (recall that OLS assumes continuous interval level data)

- Disturbances are <u>_not_</u> normally distributed, they follow the Bernoulli distribution.

- `$y_i$` is distributed with mean `$p_i$` and variance 
	`$p_i(1-p_i)$`. `$p_i$` is a function of `$x$`, so we have **heteroscedasticity** with variance related to the `$x$`.

+ But can use weighted least squares to adjust for this.

- Linear assumption is suspect

- Goodness-of-fit measures are even less helpful

---

## The solution

![:space 5]

**_Use a model that fits the functional form of the outcome variable_**!

![:space 5]

Two ways of thinking about doing this (both point to the same modeling strategy):

![:space 3]

- **_Latent variables_**

- **_Generalized linear models_**

---

### Latent variable way of thinking

- Recall we only observe `$y = 1$` (success) and `$y = 0$` (failure).

- But assume there is some underlying process with a continuous distribution `$y^*$`, such that

`$$y_i^* = \beta_0 + \beta_1 x_i + \epsilon_i$$`

- where `$y_i = 1$` (success) if `$y_i^* > \tau$` (some threshold) and `$y_i = 0$` otherwise (failure)

`$$y_i = \begin{cases}1~~~\text{if}~y^* > \tau \\ 0~~~\text{if}~y^* \le \tau \end{cases}$$`

- The threshold ( `$\tau$` ) is some point on the latent distribution. For simplicity, we can set `$\tau = 0$`.

---

---

- `$y^*$` is continuous, so we avoid the problems encountered using LPM. However, `$y^*$` is unobserved, so we cannot estimate it using OLS.

- Instead we use Maximum Likelihood Estimations, which requires assumptions about the **distribution of the errors**.

- Can write the probability of success `$p_i$` as (recall `$\tau = 0$`)

`$$p_i = pr(\beta_0 + \beta_1 x_i + \epsilon_i > 0)$$`

`$$p_i = pr(\epsilon_i > -\beta_0 - \beta_1 x_i)$$`
`$$p_i = pr(\epsilon_i < \beta_0 + \beta_1 x_i)$$`

`$$p_i = F(\beta_0 + \beta_1 x_i)$$`

- Where `$F(.)$` is the cumulative density function (CDF) of `$\epsilon$`. We can evaluate the CDF of `$\epsilon_i$` at `$\beta_0 + \beta_1 x_i$` to get the probability of success. But we need to pick a form for `$F(.)$`.

---

---

The **logistic (logit)** and **Normal (probit)** distributions are used frequently.
<br><br><br><br>
.center[
|   | CDF | PDF  |
|---|-----|---|---|
| **Normal** | `$\Phi(\epsilon) = \int^\epsilon_{-\infty} \frac{1}{(\sqrt{2 \pi})} e^{-\frac{t^2}{2}} dt~~$` | `$\phi(\epsilon) = \frac{1}{(\sqrt{2 \pi})} e^{-\frac{\epsilon^2}{2}}$` | 
| **Logistic** | `$\Lambda(\epsilon) = \frac{e^\epsilon}{1 + e^\epsilon}$` | `$\lambda(\epsilon) = \frac{e^\epsilon}{(1 + e^\epsilon)^2}$` |
]

---

---

### Generalized Linear Model way of thinking

- Rather than thinking in terms of a latent variable, let's specify a relationship between `$x$` and the probability of an event such that `$pr(y_i = 1 | x)$` is a function that ranges from `$-\infty$` to `$\infty$`.

- First, we transform the outcome into the **odds**

`$$\frac{pr(y_i = 1 | x)}{pr(y_i = 0 | x)} = \frac{pr(y_i = 1 | x)}{1 - pr(y_i = 1 | x)}$$`

- **Odds** indicate how often something happens relative to how often it doesn't happen.

- The **log of the odds** which ranges from `$-\infty$` to `$\infty$`.

`$$ln\begin{bmatrix}\frac{pr(y_i = 1 | x)}{1 - pr(y_i = 1 | x)}\end{bmatrix}  = X \beta$$`

---

### Generalized Linear Model way of thinking

- This is equivalent to the logit model we've already discussed.

`$$pr(y_i = 1|X) = \frac{e^{X\beta}}{1+e^{X\beta}} = \frac{1}{1+e^{-X\beta}}$$`

- Other probability models can be constructed by choosing functions of `$X\beta$` that range from 0 to 1. As we've seen, CDFs have this property.

- The CDF of the standard normal distribution results in the probit model.

`$$pr(y_i = 1|X) = \int^{X\beta}_{-\infty} \frac{1}{(\sqrt{2 \pi})} e^{-\frac{t^2}{2}} dt = \Phi(X\beta)$$`

---

In essence, our function `$F(\cdot)$` (which is known as a **link function**) maps our linear combination of independent variables ( `$X\beta$` ) onto a probability space (ranging from 0 to 1).

$$ F(X\beta) \mapsto [0,1]$$

---

Recall that the outcome `$y_i$` follows a **Bernoulli distribution**.

`$$y_i \sim bernoulli(p_i)$$`
`$$y_i \sim p_i^{y_i}(1-p_i)^{1-y_i}$$`

<br>

We can plug in our linear combination of predictors into this probability distribution. All we need to make sure is that our predictors map to a probablity space (this is our "probability model")

$$ \pi_i =\Phi(\beta_0 + \beta_1 x_i)  $$
<br>
`$$y_i \sim bernoulli(\pi_i)$$`

`$$y_i \sim \pi_i^{y_i}(1-\pi_i)^{1-y_i}$$`

---

```r
set.seed(123)
N = 100
x <- rnorm(N)            # random variable
y_star <- -1 + .5*x       # latent variable as a linear combination

# convert to a probability space
*pr <- pnorm(y_star)

# Drop the probability into a bernoulli dist. (0 or 1)
*y <- rbinom(N, size=1, prob = pr)

# Gather as dataset
dat1 <- tibble(x,y_star,pr,y)
head(dat1)
```

```
## # A tibble: 6 x 4
##         x y_star    pr     y
##     <dbl>  <dbl> <dbl> <int>
## 1 -0.560  -1.28  0.100     0
## 2 -0.230  -1.12  0.132     1
## 3  1.56   -0.221 0.413     1
## 4  0.0705 -0.965 0.167     0
## 5  0.129  -0.935 0.175     0
## 6  1.72   -0.142 0.443     1
```

---

```r
pairs(dat1,col="steelblue")
```

```r
mod = glm(y~x,data=dat1,family=binomial(link = "probit"))
broom::tidy(mod)
```

```
## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   -1.02      0.166     -6.16 7.32e-10
## 2 x              0.532     0.180      2.96 3.05e- 3
```

---

class:newsection

## Maximum Likelihood Estimation

---

### The Steps

- (1) Construct a probability model

$$ \pi = \Phi(X\beta)$$

- (2) Find the probability of the data given the parameters by constructing the likelihood function.

`$$pr(y|\beta) = \mathcal{L}(\beta|y) = \prod^n_{i=1}\pi_i^{y_i}(1-\pi_i)^{1-y_i}$$`

- (3) Take the log of the likelihood and simplify

`$$\log \mathcal{L}(\beta | y) = \sum_{i = 1}^n y_i \log \pi_i + \sum_{i = 1}^n (1 - y_i)\log(1 - \pi_i)$$`

- (4) Maximize the log likelihood function

---

### Optimize

`$$\log \mathcal{L}(\beta | y) = \sum_{i = 1}^n y_i \log [\Phi(X_i\beta)] + \sum_{i = 1}^n (1 - y_i)\log\left[1 - \Phi(X_i\beta)\right]$$`

Let's construct a log likelihood function for a probit model.

```r
ll.probit <- function(beta, y, X) {
  
  # Convert to a probability sapce
  pr <- pnorm(X%*%beta)
  
  # Caluculate the log-likelihood for every observation
  loglik <- sum(y*log(pr)) + sum((1 - y)*log(1 - pr))
  
  # Return the log-likelihood
  return(loglik)
}
```

How might we change this to a logit function?

---

### Optimize

<br>

- Many ways to **optimize an objective function**:

+ Nelder and Mead (default for `optim()`)
  + Newton-Raphson
  + BFGS (quasi-Newton-Raphson)
  + Gradient Descent
  + BFGS 
  + SANN (simulated annealing)
  + And many more...
  
<br>
  
- `R` holds an `optim()` function has many common optimization algorithms contained within it.

---

```r
# optimization function to calculate maximum likelihood 
probit <- function(y, X) {
  
  # Starting values
  init.par <- rep(0, ncol(X))
  
  # optimizer
  est <- optim(par = init.par,  # starting points
               fn = ll.probit,   # function to optimize
               y = y, X = X,    # Data values
               control = list(fnscale = -1), # -1 == maximize
               hessian = TRUE)  # return the hessian
  
  beta.hat <- est$par # Estimates for beta
  cov <- solve(-est$hessian) # inverse hess == vcov matrix
  
  # Return as list
  res <- list(beta.hat = beta.hat,
              cov = cov)
  return(res)
}
```

---

```r
# Recall our simulated model
mod <- glm(y~x,data=dat1,family=binomial('probit'))
broom::tidy(mod)
```

```r
y <- dat1$y
X <- cbind(1,dat1$x) # 1s for the constant
*mod2 <- probit(y,X) # Our homemade MLE optimizer

# Estimates
mod2$beta.hat
```

```
## [1] -1.021831  0.531962
```

```r
# Standard Errors
sqrt(diag(mod2$cov))
```

```
## [1] 0.1652268 0.1777183
```

---

class:newsection

### Substantive Effects

---

### What is the effect of `x` on `y`?

- In OLS, calculating the marginal effect is easy.

`$$y = \beta_0 + \beta_1 x$$`

`$$\frac{dy}{dx} = \beta_1$$`

- In MLE, less so...

`$$pr(y=1|x) = \Phi(\beta_0 + \beta_1 x)$$`
`$$\frac{dy}{dx} pr(y=1|x) = \phi(\beta_0 + \beta_1 x)\beta_1$$`

where `$\phi(\cdot)$` (the Normal PDF) denotes the first derivative of Normal CDF.

> Recall the **chain rule**: `$F(x) = f(g(x)) \to F'(x) = f'(g(x))g'(x)$`

---

### What is the effect of `x` on `y`?

- The effect depends on `$\Phi(\cdot)$`, the coefficient on `$x$` (i.e. `$\beta$`), and the values of <u>**_all the other variables and their respective coefficients_**</u>.

- In other words the **_model is inherently interactive in all of the variables_**. The effect of a unit change in `$x$` is not constant. It depends where we are on the curve.

- The output from the model merely tells us:
  + The **sign** of the effect of `$x$` on the probability of success
  + Overall **statistical significance** of the effect
  
- We need to **transform** them in order to determine **_substantive significance_**.
  
---
  
<img src="week-12_ppol561_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" />

---

### Marginal Effects

We can easily calculate and plot the first derivative to get a sense of the marginal effects.

```r
b = coefficients(mod)
X = model.matrix(mod)
head(X)
```

```
##   (Intercept)           x
## 1           1 -0.56047565
## 2           1 -0.23017749
## 3           1  1.55870831
## 4           1  0.07050839
## 5           1  0.12928774
## 6           1  1.71506499
```

```r
*me = dnorm(X%*%b)*b[2]
```

---

```r
dat1 %>% 
  mutate(me=me) %>%
ggplot(aes(x,me)) +
  geom_line() +
  labs(y="dy/dx")
```

But marginal effects don't always offer an intuitive interpretation of the results.

---

### Predicted probabilities

- Interest is in the **_substantive significance of the effect_**

- Approaches for substantive interpretation using predicted probabilities

+ **_Average Case Approach_**: set all other values to means (modes for dummies)
  
  + **_Observed Value Approach_**: set all other values to their observed values

---

### Predicted probabilities

- Interest is in the **_substantive significance of the effect_**

+ Coefficients reveal little
  
  + Probabilities are easier to understand than log-odds
  
  + The effects depend on where you start on the curve and on the values of all of the other independent variables in the model
 
<br>
 
- **Observed Value Approach** provides a closer connection between the results and the theory and research design (_Hanmer and Kalkan 2013_)

- The goal is to estimate the **_average effect_**, _NOT_ the **_effect for the average case._**

---

### Example Data

- Post-election survey taken after the 2000 presidential election (roughly 2188 responses in this data. Non-response on the voting outcome are dropped.).

- Question: does how many days out voter registration closes impact an individual's propensity to vote?

```r
head(elect)
```

```
## # A tibble: 6 x 7
##    vote close homeown edu7cat   age marriage  male
##   <dbl> <dbl>   <dbl>   <dbl> <dbl>    <dbl> <dbl>
## 1     0    14       1       1    44        0     1
## 2     1    25       0       1    64        0     0
## 3     1    29       1       1    68        1     0
## 4     1    30       1       1    65        1     0
## 5     0    30       1       1    46        1     1
## 6     1    29       0       1    59        1     1
```

---

### Example Data

- Post-election survey taken after the 2000 presidential election (roughly 2188 responses in this data. Non-response on the voting outcome are dropped.).

- Question: does how many days out voter registration closes impact an individual's propensity to vote?

```r
mod2 <- glm(vote~close+male+age,
            data=elect,family=binomial('probit'))
broom::tidy(mod2) %>% 
  mutate_if(is.numeric,function(x) round(x,3))
```

```
## # A tibble: 4 x 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)   -0.288     0.106    -2.71    0.007
## 2 close         -0.006     0.003    -2.20    0.028
## 3 male          -0.004     0.057    -0.073   0.942
## 4 age            0.019     0.002    11.2     0
```

---

Who are we trying to predict?

```r
b <- coefficients(mod2)

z <- b[1]    + # intercept
     b[2]*15 + # close
     b[3]*1  + # male
     b[4]*35   # age

pnorm(z) # probability of voting
```

```
## (Intercept) 
##   0.6103261
```

```r
z <- b[1]    + # intercept
     b[2]*0  + # close
     b[3]*0  + # male
     b[4]*76   # age

pnorm(z) # probability of voting
```

```
## (Intercept) 
##   0.8775209
```

---

### Predicted probabilities

**The Recipe**:

1. Isolate a single (discrete or continuous) variable.

2. Manipulate only that value.

3. Calculate the prediction and average across the observations.

4. Compare to some other condition.

---

```r
X = model.matrix(mod2)
head(X)
```

```
##   (Intercept) close male age
## 1           1    14    1  44
## 2           1    25    0  64
## 3           1    29    0  68
## 4           1    30    0  65
## 5           1    30    1  46
## 6           1    29    1  59
```

```r
# Alter a condition (Register at the poll)
X[,2] = 0
```

```r
# Predict
prob_sameday = pnorm(X%*%b)
```

```r
# average
mean(prob_sameday)
```

```
## [1] 0.7102054
```

---

```r
X = model.matrix(mod2)
head(X)
```

```r
# Alter a condition (Registration closes 30 days out)
X[,2] = 30
```

```r
# Predict
prob_30days = pnorm(X%*%b)
```

```r
# average
mean(prob_30days)
```

```
## [1] 0.6451213
```

---

## Discrete Difference

We can compare two predictions (where only one data value is altered) by differencing the predictions.

```r
mean(prob_30days) - mean(prob_sameday)
```

```
## [1] -0.06508412
```

<br>
When the polls close 30 days prior to election day the likelihood of one voting decreases by 6.5%!

<br>
What's one issue with this conclusion?

![:space 5]

---

## Monte Carlo Simulation

- We need to calculate uncertainty around our predictions. _But_ we can’t directly calculate our confidence for any one prediction, so we need to simulate it.

- The aim is to:

+ **simulate** all the possible configurations of the **coefficients**, 
  + **compute** the resulting **predictions** for each simulated beta, and then 
  + **calculate** the 95% **confidence interval** using the resulting distribution.

- To do this, we can:
  + take repeated random draws from a simulated distribution of the model coefficient
  + use information from the model regarding the bounds of this distribution (central tendancy, covariance)

---

```r
# Simulate the coefficents
betas = coefficients(mod2) # coeficient estimates
sigma = vcov(mod2) # variance-covariance matrix
sigma 
```

```
##               (Intercept)         close          male           age
## (Intercept)  0.0112984032 -1.831202e-04 -1.801604e-03 -1.262345e-04
## close       -0.0001831202  8.383407e-06  3.785560e-06 -1.576920e-07
## male        -0.0018016042  3.785560e-06  3.213478e-03  4.546818e-06
## age         -0.0001262345 -1.576920e-07  4.546818e-06  2.909608e-06
```

Simulate the beta coefficents using a multinormal distribution. We can use the `mvrnorm()` function to take random draws from this distribution using the `MASS` package.

```r
set.seed(1234) # Set a seed for reproducibility
*sim_betas = MASS::mvrnorm(n=1000,mu=betas,Sigma = sigma)
```

```r
dim(sim_betas)
```

```
## [1] 1000    4
```

---

```r
betas
```

```
##  (Intercept)        close         male          age 
## -0.288148097 -0.006378627 -0.004149490  0.019089884
```

```r
as_tibble(sim_betas) %>% 
  gather(param,betas) %>% 
  ggplot(aes(betas)) +
  geom_histogram(color="white",alpha=.5,fill="darkred") +
  facet_wrap(~param,ncol=4,scales="free")
```

---

```r
X = model.matrix(mod2)
head(X,3)
```

```
##   (Intercept) close male age
## 1           1    14    1  44
## 2           1    25    0  64
## 3           1    29    0  68
```

```r
X[,2] = 0 # same day voting
```

Calculate the average predicted value for every combination of the coefficents

```r
n_sims = nrow(sim_betas)
probs_sameday <- rep(0,n_sims)
for ( i in 1:n_sims){
  probs_sameday[i] <- mean(pnorm(X%*%sim_betas[i,]))
}
*quantile(probs_sameday,probs = c(.025,.975))
```

```
##      2.5%     97.5% 
## 0.6645399 0.7521849
```

---

```r
X = model.matrix(mod2)
head(X,3)
```

```
##   (Intercept) close male age
## 1           1    14    1  44
## 2           1    25    0  64
## 3           1    29    0  68
```

```r
X[,2] = 30 # closes 30 days out
```

Calculate the average predicted value for every combination of the coefficents

```r
n_sims = nrow(sim_betas)
probs_30days <- rep(0,n_sims)
for ( i in 1:n_sims){
  probs_30days[i] <- mean(pnorm(X%*%sim_betas[i,]))
}
*quantile(probs_30days,probs = c(.025,.975))
```

```
##      2.5%     97.5% 
## 0.6205824 0.6694931
```

---

```r
bind_rows(
  tibble(pred = probs_30days, cond = "30 Days"),
  tibble(pred = probs_sameday, cond = "Same Day")
) %>% 
  ggplot(aes(pred,fill=cond)) +
  geom_density(alpha=.6,color="white")
```

---

```r
difference = probs_30days-probs_sameday
```

Is the difference between the two predictions statistically signficant?

```r
bounds = quantile(difference,probs = c(.025,.975))
tibble(
  lower = bounds[1],
  prob = mean(difference),
  higher = bounds[2],
)
```

```
## # A tibble: 1 x 3
##    lower    prob   higher
##    <dbl>   <dbl>    <dbl>
## 1 -0.120 -0.0648 -0.00550
```

--
<br><br>

**Just barely!**

---

---

### `obsval`

That's pretty involved! Is there a package?

Of course...

```r
# Install package from Github (makes calculating sim values easy)
devtools::install_github("chrismeserole/obsval")
require(obsval)
```

```r
mod4 <- obsval::obsval(vote~close+male+age,
               data=elect,
               reg.model = "probit",
               effect.var = "close",
               effect.vals = c(0,10,15,30,60,90))
apply(mod4$preds,2,mean)
```

```
##     var_0    var_10    var_15    var_30    var_60    var_90 
## 0.7101687 0.6891826 0.6783802 0.6448847 0.5743454 0.5021520
```

---

```r
as_tibble(mod4$preds) %>% 
  gather(var,val) %>% 
  ggplot(aes(var,val)) +
  geom_boxplot(size=1) +
  labs(y="Probability of Voting",
       x = "Close of Voter Registration") +
  theme(axis.text=element_text(size=16),
        axis.title=element_text(size=16))
```

---

```r
elect %>% 
  ggplot(aes(factor(close))) +
  geom_bar() +
  labs(x = "Close of Voter Registration",y="Count")
```

---

## Note on the `obsval` output

```r
# stored as a list
class(mod4)
```

```
## [1] "list"
```

```r
# many different features of the prediction
names(mod4)
```

```
##  [1] "model"           "sim.coefs"       "preds"           "means"          
##  [5] "low.ci"          "high.ci"         "control.preds"   "control.mean"   
##  [9] "control.low.ci"  "control.high.ci" "effect.preds"    "effect.mean"    
## [13] "effect.low.ci"   "effect.high.ci"  "effect_sum"      "effect.var"     
## [17] "reg.model"
```

```r
# The N simulated probabilites for each manipulation
dim(mod4$preds)
```

```
## [1] 1000    6
```

---

## Note on the `obsval` output

```r
# Holds all the relevant summary values
tibble(cond = colnames(mod4$means),
       est = as.numeric(mod4$means),
       low = as.numeric(mod4$low.ci),
       high = as.numeric(mod4$high.ci)) 
```

```
## # A tibble: 6 x 4
##   cond     est   low  high
##   <chr>  <dbl> <dbl> <dbl>
## 1 var_0  0.710 0.662 0.755
## 2 var_10 0.689 0.658 0.720
## 3 var_15 0.678 0.654 0.703
## 4 var_30 0.645 0.622 0.669
## 5 var_60 0.574 0.495 0.656
## 6 var_90 0.502 0.364 0.650
```