PPOL564 | Data Science 1 | Foundations Week 11 Classification

# PPOL564 | Data Science 1 | Foundations Week 11 Classification 
###  Prof. Eric Dunford  ◆  Georgetown University  ◆  McCourt School of Public Policy  ◆  <a href="mailto:eric.dunford@georgetown.edu" class="email">eric.dunford@georgetown.edu</a>

---

<div class="slide-footer"> 
PPOL564 | Data Science 1

&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;

Week 11

&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;

Classification

</div>

---

# Classification Problems

---

### How is the outcome `$y$` distributed?

_Outcomes_ come in many forms. How the outcome is distributed will determine the methods we use.

- **Quantitative** outcome

+ a continuous/interval-based outcome: e.g. housing price, number of bills passed, stock market prices, etc.

- **Qualitative** outcome

+ a discrete outcome
  
      + _Binary_: War/No War; Sick/Not Sick
      
      + _Ordered_: Don't Support, Neutral, Support
      
      + _Categorical_: Cat, Dog, Bus, ...

---

![:space 10]

---

---

### Decision Boundary

---

# Performance Metrics

---

### How did we do?

- Central to any machine learning task is how we choose to define "good" performance.

- When dealing with quantitative outcomes (intervals), we can utilize metrics like MSE to assess performance.

`$$MSE = \frac{\sum^N_{i=1} (y_i - \hat{f}(X_i))^2}{N}$$`

---

### How did we do?

- Central to any machine learning task is how we choose to define "good" performance.

- When dealing with quantitative outcomes (intervals), we can utilize metrics like MSE to assess performance.

- When dealing with qualitative outcome (categories), we need to rely on different metrics to assess performance.

`$$\text{Accuracy} = \frac{\text{Correctly Classified}}{\text{Total Possible}}$$`
`$$\text{Error} = 1 - \text{Accuracy}$$`

---

### The Weather Today

Consider if we were testing the accuracy of two weather persons. Below are their forecasts for the weather in a given week alongside the observed weather pattern. (For now, let's just focus on two types of weather: sunny day or rainy day)

.center[
|Weather Person | M | Tu | W | Th | F | St | Su |
|---------------|---|----|---|----|---|----|----|
| `$WP_1$` Prediction | Rain | Sun  | Rain | Sun | Sun | Rain | Rain |
| `$WP_2$` Prediction | Sun  | Sun  | Sun  | Sun | Sun | Sun  | Sun  |
| Actual            | Sun  | Sun  | Rain | Sun | Sun | Sun  | Sun |
]

.center[
|Weather Person | Correct | Total | Accuracy | Error |
|---------------|---------|-------|----------|-------|
| `$WP_1$`        |    4    |   7   |   57.1%  | 42.9% |
| `$WP_2$`        |    6    |   7   |   85.7%  | 14.3% |
]

If we calculate the accuracy for each, it looks as if Weather Person 2 is the most accurate. Does that make sense?

---

### The Weather Today

Rain is **rare**. We can always have high accuracy if we just guess sun every day. This is generates a problem if what people care about is when to pack an umbrella!

---

### Confusion Matrix

.center[
|                       |  `$Positive_{~~\text{Actual}}$` |  `$Negative_{~~\text{Actual}}$` |
|-----------------------|----------|----------|
| `$Positive_{~~\text{Predicted}}$`  |   True Positive (TP)       | False Positive (FP)          |
| `$Negative_{~~\text{Predicted}}$`  |   False Negative (FN)       |  True Negative (TN)         |

]

| Metric | Calculation |  Description |
|---|-----| -----|
| Accuracy | `$\frac{TP + TN}{TP+FP+TN+FN}$` | In total, how accurate is the model |
| Precision | `$\frac{TP}{TP+FP}$` | Of the true positives classified, how many are actually positive | 
| Specificity | `$\frac{ TN }{ TN + FP }$` | Of the actual true negatives, how many were correctly classified | 
| Recall/Sensitivity | `$\frac{TP}{ TP + FN}$` | Of the actual true positives, how many were correctly classified |

---

### Weather Person 1

.center[
|                       |  `$Positive_{~~\text{Actual}}$` |  `$Negative_{~~\text{Actual}}$` |
|-----------------------|----------|----------|
| `$Positive_{~~\text{Predicted}}$`  |   3      |  0   |
| `$Negative_{~~\text{Predicted}}$`  |   3      |  1   |

]

- Accuracy = 57.1%

- Precision = 1%

- Specificity = 100%

- Recall = 50%

---

### Weather Person 2

.center[
|                       |  `$Positive_{~~\text{Actual}}$` |  `$Negative_{~~\text{Actual}}$` |
|-----------------------|----------|----------|
| `$Positive_{~~\text{Predicted}}$`  |   6      |  1   |
| `$Negative_{~~\text{Predicted}}$`  |   0      |  0   |

]

- Accuracy = 85.7%

- Precision = 85.7%

- Specificity = 0%

- Recall = 100%

---

### ROC Curves

Consider the following:

- We want to predict how many rainy days (1) there will be, sunny otherwise (0).

- Our model outputs probabilities of a rainy day where 0 means no chance, 1 means it's absolutely going to rain.

```python
# Our estimated probabilities 
est_probs = [.4,.7,.3,.5,.9,.1,.7]
est_probs 
```

```
## [0.4, 0.7, 0.3, 0.5, 0.9, 0.1, 0.7]
```

---

### ROC Curves

Consider the following:

- We need to convert these probabilities to predictions. We can do this by setting a **threshold**.

```python
threshold = .5
our_preds = [1 if e >= threshold else 0 for e in est_probs]
our_preds
```

```
## [0, 1, 0, 1, 1, 0, 1]
```

---

### ROC Curves

Consider the following:

- We can now compare these predictions to the actual values.

.center[
|                       |  `$Positive_{~~\text{Actual}}$` |  `$Negative_{~~\text{Actual}}$` |
|-----------------------|----------|----------|
| `$Positive_{~~\text{Predicted}}$`  |   2      |  1   |
| `$Negative_{~~\text{Predicted}}$`  |   1      |  3   |

]

- Thresholds reflect how sensitive we are to true or false positives. 
  
  + The higher the threshold, the less false positives.
  
  + The lower the threshold, the more false positives but more true positives. 
  
  + **It's another tradeoff!**
  
---

### ROC Curves

Receiver operating characteristic (ROC) curve offers a visual representation of model performance across different potential thresholds.

---

### Area Under the Curve (AUC)

We can calculate the area under the ROC curve to quickly and easily compare model performance.