PPOL561 | Accelerated Statistics for Public Policy II Week 2 Data Wrangling & Presentation

# PPOL561 | Accelerated Statistics for Public Policy II Week 2 Data Wrangling & Presentation 
###  Prof. Eric Dunford  ◆  Georgetown University  ◆  McCourt School of Public Policy  ◆  <a href="mailto:eric.dunford@georgetown.edu" class="email">eric.dunford@georgetown.edu</a>

---

<div class="slide-footer"> 
PPOL561 | Accelerated Statistics for Public Policy II

&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;

Week 2

&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;

Wrangling & Presentation

</div>

---

# Outline for Today

Cover how to

- **_reading in_**

- **_manipulating_** 
  
  - **_piping_**
  
  - **_joining_**  
  
  - **_reshaping_** 
  
  - **_visualizing_**

...data in `R`.

Plus, a brief discussion on **_generating tables_** for model output.

---

# Data Wrangling

---

# What is data wrangling?

![:space 5]

- **raw &rarr; processed**: the process of transforming data from one format to another.

- **converting the structure** to facilitate some analysis
  + **_altering the unit of analysis_**: going from individuals in a state in a given year to state-year by
  
  + changing from a **_"wide"_** (many columns, few rows) **_to a "long" structure_** (few colums, many rows)
  
  + **_summarizing data_** across specific subgroups 
 
---

## `tidyverse` approach

![:space 5]

We are going to cover the basics of data manipulation and visualization in `R`. By focusing on a suite of packages known as the "[tidyverse](https://www.tidyverse.org)".

These packages were designed to ease the process of data manipulation and management so that it is more intuitive, efficient, and interpretable.

Specifically, the `tidyverse` is a housing package that holds the following packages:
- [readr](http://readr.tidyverse.org/) - for reading data in
- [tibble](https://tibble.tidyverse.org/) - for "tidy" data structures
- [dplyr](http://dplyr.tidyverse.org/) - for data manipulation
- [ggplot2](http://ggplot2.tidyverse.org/) - for data visualization
- [tidyr](http://tidyr.tidyverse.org/) - for cleaning
- [purrr](http://purrr.tidyverse.org/) - functional programming toolkit

---

## `tidyverse` advantage

![:space 15]

Most of everything we will do will require in data manipulation just **one** package

```r
# Install tidyverse package
install.packages('tidyverse')

# Load the package
require(tidyverse)
```
---

# Reading & Writing Data
  
---

# Data Packages

![:space 5]

We can import a large variety of data file types into the `R` programming environment.

We are going to focus on **three packages** to import different (but common) data types:

- `readr` --- an expansive array of functions to read different data types
- `readxl` --- for excel spreadsheets
- `haven` --- for SPSS, SAS, and .dta

```r
require(readr) # Imported with tidyverse
require(readxl)
require(haven)
```

---

# Importing/Exporting

![:space 15]
.center[

| ![:text_size 6](File type) | ![:text_size 6](package) | ![:text_size 6](read) | ![:text_size 6](write) |
|-----------|---------|--------------|---------------|
| `.csv`    | `readr` | `read_csv()` | `write_csv()` |
| `.dta`    | `haven` | `read_dta()` | `write_dta()` |
| `.xlsx`   | `readxl` | `read_excel()` | `write_excel()` |
| `.Rdata`   | Base `R` | `load()` | `save()` |
| `.rds`   | `readr` | `read_rds()` | `write_rds()` |
| `.tab`    | `readr` | `read_tsv()` | `write_tsv()` |
| generic   | `readr` | `read_table()` | `write_table()` |

]

---

## `tibble()` data frames

![:space 10]

Differences between `tibbles` and `data.frames`

1. Tibbles have a refined print method that shows only the first 10 rows, and all the columns that fit on screen. This makes it much easier to work with large data.

2. More explicit errors: tibbles are strict. They throw lots of errors, meaning we catch mistakes early.

---

# Manipulation

---

# Manipulation with `dplyr`

![:space 10]

The `dplyr` package (part of the `tidyverse`) offers an intuitive **verb based** approach to data management in `R`.

![:space 5]

The goal of the `dplyr` logic is to provide an easy, intuitive naming convention for ubiquitous to data management tasks.

---

## 6 main `dplyr` verbs

![:space 3]
- **`select()`**: Pick variables by their names.

- **`filter()`**: Pick observations by their values

- **`arrange()`**: Reorder the rows.

- **`mutate()`**: Create new variables with functions of existing variables.

- **`summarise()`**: Collapse many values down to a single summary.

- **`group_by()`**: changes the scope of each function from operating on the entire dataset to operating on it group-by-group. 
---

class:biglist

## How they work...

**All verbs work similarly:**

1. The first argument is a data frame.

2. The subsequent arguments describe what to do with the data frame, using the variable names (without quotes).

3. The result is a new data frame.

---

To walk through the performance of the main `dplyr` verbs, we'll use an internal dataset called `presidential`.

```r
?presidential # for information on the data
```

```r
dat <- presidential
head(dat)
```

```
## # A tibble: 6 x 4
## name start end party 
## <chr> <date> <date> <chr> 
## 1 Eisenhower 1953-01-20 1961-01-20 Republican
## 2 Kennedy 1961-01-20 1963-11-22 Democratic
## 3 Johnson 1963-11-22 1969-01-20 Democratic
## 4 Nixon 1969-01-20 1974-08-09 Republican
## 5 Ford 1974-08-09 1977-01-20 Republican
## 6 Carter 1977-01-20 1981-01-20 Democratic
```

---

# `select()`

---

```r
select(dat,name,party)
```

```
## # A tibble: 11 x 2
## name party 
## <chr> <chr> 
## 1 Eisenhower Republican
## 2 Kennedy Democratic
## 3 Johnson Democratic
## 4 Nixon Republican
## 5 Ford Republican
## 6 Carter Democratic
## 7 Reagan Republican
## 8 Bush Republican
## 9 Clinton Democratic
## 10 Bush Republican
## 11 Obama Democratic
```

---

Or variable ranges using `:`

The following will provide all variables in-between `name` and `end`.

```r
select(dat,name:end)
```

```
## # A tibble: 11 x 3
## name start end 
## <chr> <date> <date> 
## 1 Eisenhower 1953-01-20 1961-01-20
## 2 Kennedy 1961-01-20 1963-11-22
## 3 Johnson 1963-11-22 1969-01-20
## 4 Nixon 1969-01-20 1974-08-09
## 5 Ford 1974-08-09 1977-01-20
## 6 Carter 1977-01-20 1981-01-20
## 7 Reagan 1981-01-20 1989-01-20
## 8 Bush 1989-01-20 1993-01-20
## 9 Clinton 1993-01-20 2001-01-20
## 10 Bush 2001-01-20 2009-01-20
## 11 Obama 2009-01-20 2017-01-20
```

---

The **order** in which variables are selected will translate to the output. Thus, one can easily **reorder columns** with `select()`.

```r
select(dat,name,end,start)
```

```
## # A tibble: 11 x 3
## name end start 
## <chr> <date> <date> 
## 1 Eisenhower 1961-01-20 1953-01-20
## 2 Kennedy 1963-11-22 1961-01-20
## 3 Johnson 1969-01-20 1963-11-22
## 4 Nixon 1974-08-09 1969-01-20
## 5 Ford 1977-01-20 1974-08-09
## 6 Carter 1981-01-20 1977-01-20
## 7 Reagan 1989-01-20 1981-01-20
## 8 Bush 1993-01-20 1989-01-20
## 9 Clinton 2001-01-20 1993-01-20
## 10 Bush 2009-01-20 2001-01-20
## 11 Obama 2017-01-20 2009-01-20
```

---

We can also easily **rename** variables by simply providing a new name within the function.

```r
select(dat,president=name,
       startdate=start,
       enddate=end)
```

```
## # A tibble: 11 x 3
## president startdate enddate 
## <chr> <date> <date> 
## 1 Eisenhower 1953-01-20 1961-01-20
## 2 Kennedy 1961-01-20 1963-11-22
## 3 Johnson 1963-11-22 1969-01-20
## 4 Nixon 1969-01-20 1974-08-09
## 5 Ford 1974-08-09 1977-01-20
## 6 Carter 1977-01-20 1981-01-20
## 7 Reagan 1981-01-20 1989-01-20
## 8 Bush 1989-01-20 1993-01-20
## 9 Clinton 1993-01-20 2001-01-20
## 10 Bush 2001-01-20 2009-01-20
## 11 Obama 2009-01-20 2017-01-20
```

---

Lastly, `select()` offers us a convenient way to drop variables by using the same logic that we employed with putting a **negative sign** in front of a dimension. The only difference here is that we can do the same but with a variable name.

Here we **drop** the `start` date variable.

```r
select(dat,-start)
```

```
## # A tibble: 11 x 3
## name end party 
## <chr> <date> <chr> 
## 1 Eisenhower 1961-01-20 Republican
## 2 Kennedy 1963-11-22 Democratic
## 3 Johnson 1969-01-20 Democratic
## 4 Nixon 1974-08-09 Republican
## 5 Ford 1977-01-20 Republican
## 6 Carter 1981-01-20 Democratic
## 7 Reagan 1989-01-20 Republican
## 8 Bush 1993-01-20 Republican
## 9 Clinton 2001-01-20 Democratic
## 10 Bush 2009-01-20 Republican
## 11 Obama 2017-01-20 Democratic
```

---

## `filter()`

![:space 3]

---

```r
filter(dat,party == "Republican")
```

```
## # A tibble: 6 x 4
## name start end party 
## <chr> <date> <date> <chr> 
## 1 Eisenhower 1953-01-20 1961-01-20 Republican
## 2 Nixon 1969-01-20 1974-08-09 Republican
## 3 Ford 1974-08-09 1977-01-20 Republican
## 4 Reagan 1981-01-20 1989-01-20 Republican
## 5 Bush 1989-01-20 1993-01-20 Republican
## 6 Bush 2001-01-20 2009-01-20 Republican
```

---

## `arrange()`

![:space 3]

```r
arrange(dat,party)
```

```
## # A tibble: 11 x 4
## name start end party 
## <chr> <date> <date> <chr> 
## 1 Kennedy 1961-01-20 1963-11-22 Democratic
## 2 Johnson 1963-11-22 1969-01-20 Democratic
## 3 Carter 1977-01-20 1981-01-20 Democratic
## 4 Clinton 1993-01-20 2001-01-20 Democratic
## 5 Obama 2009-01-20 2017-01-20 Democratic
## 6 Eisenhower 1953-01-20 1961-01-20 Republican
## 7 Nixon 1969-01-20 1974-08-09 Republican
## 8 Ford 1974-08-09 1977-01-20 Republican
## 9 Reagan 1981-01-20 1989-01-20 Republican
## 10 Bush 1989-01-20 1993-01-20 Republican
## 11 Bush 2001-01-20 2009-01-20 Republican
```

---

`arrange()` with the internal function `desc()` can change to a **descending** ordering.

```r
arrange(dat,desc(start))
```

```
## # A tibble: 11 x 4
## name start end party 
## <chr> <date> <date> <chr> 
## 1 Obama 2009-01-20 2017-01-20 Democratic
## 2 Bush 2001-01-20 2009-01-20 Republican
## 3 Clinton 1993-01-20 2001-01-20 Democratic
## 4 Bush 1989-01-20 1993-01-20 Republican
## 5 Reagan 1981-01-20 1989-01-20 Republican
## 6 Carter 1977-01-20 1981-01-20 Democratic
## 7 Ford 1974-08-09 1977-01-20 Republican
## 8 Nixon 1969-01-20 1974-08-09 Republican
## 9 Johnson 1963-11-22 1969-01-20 Democratic
## 10 Kennedy 1961-01-20 1963-11-22 Democratic
## 11 Eisenhower 1953-01-20 1961-01-20 Republican
```

---

## `mutate()`

![:space 3]

---

```r
mutate(dat,
 # in office during cold war
 CW = start <= '1990-03-11')
```

```
## # A tibble: 11 x 5
## name start end party CW 
## <chr> <date> <date> <chr> <lgl>
## 1 Eisenhower 1953-01-20 1961-01-20 Republican TRUE 
## 2 Kennedy 1961-01-20 1963-11-22 Democratic TRUE 
## 3 Johnson 1963-11-22 1969-01-20 Democratic TRUE 
## 4 Nixon 1969-01-20 1974-08-09 Republican TRUE 
## 5 Ford 1974-08-09 1977-01-20 Republican TRUE 
## 6 Carter 1977-01-20 1981-01-20 Democratic TRUE 
## 7 Reagan 1981-01-20 1989-01-20 Republican TRUE 
## 8 Bush 1989-01-20 1993-01-20 Republican TRUE 
## 9 Clinton 1993-01-20 2001-01-20 Democratic FALSE
## 10 Bush 2001-01-20 2009-01-20 Republican FALSE
## 11 Obama 2009-01-20 2017-01-20 Democratic FALSE
```

---

`mutate()` also allows us to **_instantly_** use variables we just created.

```r
mutate(dat,
 CW = start <= '1990-03-11',
 CW = as.numeric(CW))
```

```
## # A tibble: 11 x 5
## name start end party CW
## <chr> <date> <date> <chr> <dbl>
## 1 Eisenhower 1953-01-20 1961-01-20 Republican 1
## 2 Kennedy 1961-01-20 1963-11-22 Democratic 1
## 3 Johnson 1963-11-22 1969-01-20 Democratic 1
## 4 Nixon 1969-01-20 1974-08-09 Republican 1
## 5 Ford 1974-08-09 1977-01-20 Republican 1
## 6 Carter 1977-01-20 1981-01-20 Democratic 1
## 7 Reagan 1981-01-20 1989-01-20 Republican 1
## 8 Bush 1989-01-20 1993-01-20 Republican 1
## 9 Clinton 1993-01-20 2001-01-20 Democratic 0
## 10 Bush 2001-01-20 2009-01-20 Republican 0
## 11 Obama 2009-01-20 2017-01-20 Democratic 0
```

---

Like `mutate()`, `transmute()` provides a method for generating a new variable, but unlike the former, it **returns only the newly created variable**.

```r
transmute(dat,CW = start <= '1990-03-11')
```

```
## # A tibble: 11 x 1
## CW 
## <lgl>
## 1 TRUE 
## 2 TRUE 
## 3 TRUE 
## 4 TRUE 
## 5 TRUE 
## 6 TRUE 
## 7 TRUE 
## 8 TRUE 
## 9 FALSE
## 10 FALSE
## 11 FALSE
```

---

## `summarize()`

```r
summarize(dat,
          days_in_office = mean(end-start),
          max = max(end-start),
          min = min(end-start))
```

```
## # A tibble: 1 x 3
## days_in_office max min 
## <drtn> <drtn> <drtn> 
## 1 2125.091 days 2922 days 895 days
```

---

There are a number of internal functions that can be used with `mutate()`, `transmute()`,
and `summarize()`.

- `n()` ::: counts the number of observations
- `n_distinct()` ::: counts the number of distinct entries

```r
summarize(dat,N=n(),N_party=n_distinct(party))
```

```
## # A tibble: 1 x 2
## N N_party
## <int> <int>
## 1 11 2
```

---

## `group_by()`

![:space 3]

---

When used in conjunction with some of the other functions, `group_by()` becomes a powerful **clustering function**.

```r
# group by party
x <- group_by(dat,party)
summarize(x,min_in_office = min(end-start))
```

```
## # A tibble: 2 x 2
## party min_in_office
## <chr> <drtn> 
## 1 Democratic 1036 days 
## 2 Republican 895 days
```

---

### Other functions in the tidyverse...

`tally()` or `count()` offers quick counts of a variable which can be quite useful when used alongside some of the other functions.

Here we are seeing how many observations we have _by group_.

```r
x <- group_by(dat,party)
count(x)
```

```
## # A tibble: 2 x 2
## # Groups: party [2]
## party n
## <chr> <int>
## 1 Democratic 5
## 2 Republican 6
```

---

### Other functions in the tidyverse...

`recode()` allows you to quickly **recode a variable** (though conditional statements prove to be the most efficient way to do this)

```r
mutate(dat,party = recode(party,'Republican'=1,'Democratic'=0))
```

```
## # A tibble: 11 x 4
## name start end party
## <chr> <date> <date> <dbl>
## 1 Eisenhower 1953-01-20 1961-01-20 1
## 2 Kennedy 1961-01-20 1963-11-22 0
## 3 Johnson 1963-11-22 1969-01-20 0
## 4 Nixon 1969-01-20 1974-08-09 1
## 5 Ford 1974-08-09 1977-01-20 1
## 6 Carter 1977-01-20 1981-01-20 0
## 7 Reagan 1981-01-20 1989-01-20 1
## 8 Bush 1989-01-20 1993-01-20 1
## 9 Clinton 1993-01-20 2001-01-20 0
## 10 Bush 2001-01-20 2009-01-20 1
## 11 Obama 2009-01-20 2017-01-20 0
```

---

### And much more...

We've only covered a few functions here. Here are some more...
- `sample_n()` - Grab an N random sample of your data
- `sample_frac()` - Grab a random sample that is some fraction of your total data
- `top_n()` - get the top N number of entries from a data frame
- `slice()` - grab specific row ranges
- `glimpse()` - quickly preview the data

See [here](https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf) to download a cheatsheet containing an entire list of the tidyverse `dplyr` verbs.

---

class:newsection

# Piping

---

## Combining `dplyr` functions

When we need to do a series of manipulations, we can **perform each manipulation individually and save each entry as a new object** that we write over.

```r
x <- filter(presidential,party=='Republican')
x <- group_by(x,name)
x <- transmute(x,t_in_office = end-start)
x <- arrange(x,t_in_office)
x
```

```
## # A tibble: 6 x 2
## # Groups: name [5]
## name t_in_office
## <chr> <drtn> 
## 1 Ford 895 days 
## 2 Bush 1461 days 
## 3 Nixon 2027 days 
## 4 Eisenhower 2922 days 
## 5 Reagan 2922 days 
## 6 Bush 2922 days
```

---

Or we can **nest** functions _within_ each other.

```r
arrange(
  transmute(
    group_by(
      filter(presidential,party=='Republican'),name),
    t_in_office = end-start),
  t_in_office)
```

> The issue with **nesting functions** is that it is difficult to (a) read and (b) detect a mistake!

---

### Piping Functions

The **pipe** is a useful tool that allows us to **pass** output from one function to the next.

To pipe we write **`%>%`** _in-between_ each function.

We pass the output to specific locations in the proceeding function using the pointer `.`

Piping offers is a clean way of manipulating data that is **intuitive and easy to read**.

---

Here we **pass** our `data` to `filter()` then to `group_by()` then to `transmute()` and then finally to `arrange()` which returns our output!

```r
presidential %>%
  filter(party=='Republican') %>%
  group_by(name) %>%
  transmute(t_in_office = end-start) %>%
  arrange(t_in_office)
```

---

### Two things to keep in mind when piping...

1. Functions **_must_** be linked with `%>%`

2. When functions have **multiple arguments**, point to where the data should go with a period (`.`)

```r
data %>% function(arg1= ., arg2=TRUE)
```

---

class:newsection

# Joining

---

## `_join` functions (dplyr)

![:space 5]

`dplyr` offers a range of joining/merging functions that are more intuitive to use. These functions provide a **SQL framework** that is easier to read and more efficient.
- `left_join()`
- `right_join()`
- `inner_join()`
- `full_join()`
- `anti_join()`

When joining data, you must have a **unique** identifier on the dimension you're matching on.

---

Consider the following two example datasets...

```r
data_A
```

```
##    country Var1
## 1  Nigeria    4
## 2  England    3
## 3 Botswana    6
```

```r
data_B
```

```
##         country   Var2
## 1       Nigeria    Low
## 2 United States   High
## 3      Botswana Medium
```

---

## `left_join()`

---

## `left_join()`

```r
left_join(data_A,data_B,by="country")
```

```
## country Var1 Var2
## 1 Nigeria 4 Low
## 2 England 3 <NA>
## 3 Botswana 6 Medium
```

---

## `right_join()`

---

## `right_join()`

```r
right_join(data_A,data_B,by="country")
```

```
##         country Var1   Var2
## 1       Nigeria    4    Low
## 2      Botswana    6 Medium
## 3 United States   NA   High
```

---

## `inner_join()`

---

## `inner_join()`

```r
inner_join(data_A,data_B,by="country")
```

```
##    country Var1   Var2
## 1  Nigeria    4    Low
## 2 Botswana    6 Medium
```

---

### `full_join()` 
 
.center[<img src="Figures/full_join.png", height=100px>]

---

### `full_join()`

```r
full_join(data_A,data_B,by="country")
```

```
## country Var1 Var2
## 1 Nigeria 4 Low
## 2 England 3 <NA>
## 3 Botswana 6 Medium
## 4 United States NA High
```

---

### `anti_join()` 
 
.center[<img src="Figures/anti_join_left.png", height=100px>]

---

### `anti_join()`

```r
anti_join(data_A,data_B,by="country")
```

```
##   country Var1
## 1 England    3
```

---

## `bind_rows()`

```r
bind_rows(data_A,data_B)
```

```
## country Var1 Var2
## 1 Nigeria 4 <NA>
## 2 England 3 <NA>
## 3 Botswana 6 <NA>
## 4 Nigeria NA Low
## 5 United States NA High
## 6 Botswana NA Medium
```

---

## `bind_cols()`

```r
bind_cols(data_A,data_B)
```

```
##   country...1 Var1   country...3   Var2
## 1     Nigeria    4       Nigeria    Low
## 2     England    3 United States   High
## 3    Botswana    6      Botswana Medium
```

---

## Disparate column names

Sometimes the naming conventions of two datasets don't perfectly align. When this happens, we can specify how data merges onto one another more explicitly using the `by=` argument.

Moreover, we can merge on **_more_ than one dimension** by specifying all relevant column names.

---

Once again, consider the following example data..

```r
data_A
```

```
##    country year Var1
## 1  Nigeria 1999    4
## 2  England 2001    3
## 3 Botswana 2000    6
```

```r
data_B
```

```
##    country_name year   Var2
## 1       Nigeria 1999    Low
## 2 United States 2004   High
## 3      Botswana 2003 Medium
```

---

```r
full_join(data_A,data_B,
          by=c('country'='country_name',
               'year'))
```

```
## country year Var1 Var2
## 1 Nigeria 1999 4 Low
## 2 England 2001 3 <NA>
## 3 Botswana 2000 6 <NA>
## 4 United States 2004 NA High
## 5 Botswana 2003 NA Medium
```

---

## Merging as a subsetting strategy

Using set operations to subset and join data...

---

## Merging as a subsetting strategy

Using set operations to subset and join data...

---

## Merging as a subsetting strategy

Using set operations to subset and join data...
 
.center[<img src="Figures/union.gif", height=400px>]

---

## Merging as a subsetting strategy

Using set operations to subset and join data...
 
.center[<img src="Figures/union-all.gif", height=400px>]

---

class:newsection

# Reshaping Data

---

![:space 10]

Often, we need to alter the structure of a `data.frame` from a **wide format**...

![:space 10]

.center[
<table>
 <thead>
 <tr>
 <th style="text-align:center;"> country </th>
 <th style="text-align:center;"> 1992 </th>
 <th style="text-align:center;"> 1993 </th>
 <th style="text-align:center;"> 1994 </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:center;"> Nigeria </td>
 <td style="text-align:center;"> 9.72 </td>
 <td style="text-align:center;"> 10.06 </td>
 <td style="text-align:center;"> 9.66 </td>
 </tr>
 <tr>
 <td style="text-align:center;"> Iran </td>
 <td style="text-align:center;"> 9.88 </td>
 <td style="text-align:center;"> 10.86 </td>
 <td style="text-align:center;"> 9.78 </td>
 </tr>
 <tr>
 <td style="text-align:center;"> Cambodia </td>
 <td style="text-align:center;"> 10.78 </td>
 <td style="text-align:center;"> 10.23 </td>
 <td style="text-align:center;"> 10.61 </td>
 </tr>
 <tr>
 <td style="text-align:center;"> Australia </td>
 <td style="text-align:center;"> 10.04 </td>
 <td style="text-align:center;"> 9.37 </td>
 <td style="text-align:center;"> 10.18 </td>
 </tr>
</tbody>
</table>
]

---

...into a **long format**

.center[
<table>
 <thead>
 <tr>
 <th style="text-align:center;"> country </th>
 <th style="text-align:center;"> year </th>
 <th style="text-align:center;"> var </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:center;"> Nigeria </td>
 <td style="text-align:center;"> 1992 </td>
 <td style="text-align:center;"> 9.72 </td>
 </tr>
 <tr>
 <td style="text-align:center;"> Nigeria </td>
 <td style="text-align:center;"> 1993 </td>
 <td style="text-align:center;"> 10.06 </td>
 </tr>
 <tr>
 <td style="text-align:center;"> Nigeria </td>
 <td style="text-align:center;"> 1994 </td>
 <td style="text-align:center;"> 9.66 </td>
 </tr>
 <tr>
 <td style="text-align:center;"> Iran </td>
 <td style="text-align:center;"> 1992 </td>
 <td style="text-align:center;"> 9.88 </td>
 </tr>
 <tr>
 <td style="text-align:center;"> Iran </td>
 <td style="text-align:center;"> 1993 </td>
 <td style="text-align:center;"> 10.86 </td>
 </tr>
 <tr>
 <td style="text-align:center;"> Iran </td>
 <td style="text-align:center;"> 1994 </td>
 <td style="text-align:center;"> 9.78 </td>
 </tr>
 <tr>
 <td style="text-align:center;"> Cambodia </td>
 <td style="text-align:center;"> 1992 </td>
 <td style="text-align:center;"> 10.78 </td>
 </tr>
 <tr>
 <td style="text-align:center;"> Cambodia </td>
 <td style="text-align:center;"> 1993 </td>
 <td style="text-align:center;"> 10.23 </td>
 </tr>
 <tr>
 <td style="text-align:center;"> Cambodia </td>
 <td style="text-align:center;"> 1994 </td>
 <td style="text-align:center;"> 10.61 </td>
 </tr>
 <tr>
 <td style="text-align:center;"> Australia </td>
 <td style="text-align:center;"> 1992 </td>
 <td style="text-align:center;"> 10.04 </td>
 </tr>
 <tr>
 <td style="text-align:center;"> Australia </td>
 <td style="text-align:center;"> 1993 </td>
 <td style="text-align:center;"> 9.37 </td>
 </tr>
 <tr>
 <td style="text-align:center;"> Australia </td>
 <td style="text-align:center;"> 1994 </td>
 <td style="text-align:center;"> 10.18 </td>
 </tr>
</tbody>
</table>
]

---

![:space 20]

`tidyr` is a tidyverse package built to help reshape data. The package contains an array of functions that are all useful cleaning a data construct.

`tidyr` eases tasks such as:

- dropping missing values
- filling missing values
- separating a column into two variables or uniting two columns into one

]

---