class: center, middle, inverse, title-slide #
PPOL670 | Introduction to Data Science for Public Policy
Week 5
Data Visualization
###
Prof. Eric Dunford ◆ Georgetown University ◆ McCourt School of Public Policy ◆
eric.dunford@georgetown.edu
--- layout: true <div class="slide-footer"><span> PPOL670 | Introduction to Data Science for Public Policy           Week 5 <!-- Week of the Footer Here -->              Data Visualization <!-- Title of the lecture here --> </span></div> --- class: newsection # The Components of Data Visualization --- ### What do you see? <img src="lecture-week-05_data-visualization_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> --- ### Mapping data to space .center[<img src="Figures/cartesian-coord-1.png", width=600>] --- ### Aesthetics <br> .center[<img src="Figures/common-aesthetics-1.png", width=500>] <br><br> .center[<img src="Figures/basic-scales-example-1.png", width=500>] --- ### Color as a tool to distinguish .center[<img src="Figures/breakdown-by-year_25km-1day-window.png", width=600>] --- ### Color as a tool to represent values .center[<img src="Figures/violent-events-2016_by-adm-unit_by-dataset.png", width=100%>] --- ### Color as a tool to highlight <img src="lecture-week-05_data-visualization_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- ### Presentation as information <img src="lecture-week-05_data-visualization_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> -- <img src="lecture-week-05_data-visualization_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- ### Presentation as distortion <img src="lecture-week-05_data-visualization_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> -- <img src="lecture-week-05_data-visualization_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- ### The data type drives the visualization decisions Think carefully about what you're trying to convey and what information you're using to make your point. <br> .center[ | Data Type | Example | Scale | |-------------|---------|---------| | Numerical | `1.3`, `800`, `10e3` | Continuous | | Integer | `1`, `2`, `3` | Discrete | | Categorical | `"dog"`, `"Nigeria"`, `"A"` | Discrete | | Ordered | `"Small"`, `"Medium"`, `"Large"` | Discrete | | Dates/Time | `2009-01-02`, `5:32:33` | Continuous | ] --- ### The data type drives the visualization decisions .center[ .center[<img src="Figures/amounts-1.png", width=500>] .center[<img src="Figures/proportions-1.png", width=500>] .center[<img src="Figures/single-distributions-1.png", width=500>] .center[<img src="Figures/basic-scatter-1.png", width=500>] ] --- class:newsection # Grammar of Graphics --- .pull-left[<br><br><br><br><img src = "Figures/ggplot-hex.png">] .pull-right[ `ggplot2` (a part of the `tidyverse` package) is a power graphics package that offers a flexible and intuitive graphics language capable of building sophisticated graphics. <br><br> `ggplot` has a **special syntax** that we'll have to get used to, _but_ once we understand the basics, we'll be able to produce some advanced and sophisticated graphics with ease! ] --- .pull-left[<br><br><br><br><img src = "Figures/ggplot-hex.png">] .pull-right[ `ggplot2` is based on a **grammar of graphics**. In essence, you can build every graph from the same components that follow the same intuitive naming conventions. Every graph is composed of 1. a **dataset** 2. **coordinate system** 2. **mappings** → the variables we're aiming to visualize 3. **geom**etric expressions of how the data should be projected onto a space ] --- ### (1) data Let's use the `diamonds` data, which is an example dataset provided by `ggplot` that contains the prices and other attributes of almost 54,000 diamonds. ```r glimpse(diamonds) ``` ``` ## Observations: 53,940 ## Variables: 10 ## $ carat <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.… ## $ cut <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Goo… ## $ color <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J,… ## $ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1,… ## $ depth <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59… ## $ table <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, … ## $ price <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 3… ## $ x <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.… ## $ y <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.… ## $ z <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.… ``` --- ### (2) coordinate system Use the `ggplot()` function to establish the coordinate system. ```r ggplot(data=diamonds) ``` <img src="lecture-week-05_data-visualization_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- ### (3) mappings What variables from the data do we want to map to the projected space? - What variable makes up the y-axis? - What variable makes up the x-axis? - Are there any variables to group by? (More on this later) -- <br><br> Need to use a special function `aes()` (short for "aesthetics") to map variables from the data to the geometric space. Whenever we want to plot a variable feature, we **_must_** wrap it in the `aes()` function. --- ### (3) mappings What variables from the data do we want to map to the projected space? ```r ggplot(data=diamonds,aes(x=price,y=carat)) ``` <img src="lecture-week-05_data-visualization_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- ### (4) geom → projection How should your mappings be projected onto the coordinate space? ```r ggplot(data=diamonds,aes(x=price,y=carat)) + * geom_point() ``` <img src="lecture-week-05_data-visualization_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- ### (4) geom → projection How should your mappings be projected onto the coordinate space? .pull-left[ - `geom_` are aesthetic **layers** that are mapped onto the plot. - We "add" layers and design preferences `+`. - We can add as many layers as we want. Layers placed on top of one another in accordance with the order that they are specified. - Plots can be assigned as objects and rendered later. ] .pull-right[ ```r ggplot(data=diamonds, aes(x=price,y=carat)) + geom_point() ``` <img src="lecture-week-05_data-visualization_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ] --- .center[ <font color = "green">`ggplot`</font>(data = `<DATA>`) `+` <font color = "green">`<GEOM_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) ] -- .center[ `+` <font color = "green">`<GEOM_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) `+` <font color = "green">`<GEOM_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) `+` <font color = "green">`<GEOM_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) `$$\vdots$$` ] --- .center[ <font color = "green">`ggplot`</font>(data = `<DATA>`) `+` <font color = "green">`<GEOM_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) ] .center[ `+` <font color = "red">`<SCALE_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) `+` <font color = "blue">`<THEME_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) `+` <font color = "orange">`<FACET_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) `$$\vdots$$` ] --- ### One variable? .center[ | Expression | Function | | |----|----|-----| | Area | `geom_area()` | <img src = "Figures/geom_area.png"> | | Density | `geom_density()` | <img src = "Figures/geom_density.png"> | | Dots | `geom_dotplot()` | <img src = "Figures/geom_dotplot.png"> | | Frequencies | `geom_freqpoly()` | <img src = "Figures/geom_freqplot.png"> | | Histogram | `geom_histogram()` | <img src = "Figures/geom_histogram.png"> | ] --- ### Two variables? .center[ | Expression | Function | | |----|----|-----| | Continuous Points | `geom_point()` | <img src = "Figures/geom_point.png"> | | Continous Lines | `geom_line()` | <img src = "Figures/geom_line.png"> | | Discrete Counts | `geom_count()` | <img src = "Figures/geom_count.png"> | | Continuous and Discrete Distributions | `geom_boxplot()` | <img src = "Figures/geom_boxplot.png"> | | Densities | `geom_hex()` | <img src = "Figures/geom_hex.png"> | ] --- ### Three variables? .center[ | Expression | Function | | |----|----|-----| | Densities | `geom_contour()` | <img src = "Figures/geom_contour.png"> | | Intensities | `geom_tile()` | <img src = "Figures/geom_tile.png"> | | Intensities | `geom_raster()` | <img src = "Figures/geom_raster.png"> | | Spatial | `geom_map()` | <img src = "Figures/geom_map.png"> | ] -- Just a taste. Wide array of ways to express data in a geometric space. See reading and [data visualization cheatsheet](https://github.com/tidyverse/ggplot2) for guidance. --- ### Function Types in `ggplot2` | Type | Function Header | Description | |------|-----------------|-------------| | Generate layers from data | `geom_` | Use a geom function to represent data points, use the geom’s aesthetic properties to represent variables. Each function returns a layer. | | Construct statistics layers | `stat_` | A stat builds new variables to plot (e.g., count, prop) | | Change mapping characteristics | `scale_` | Scales map data values to the visual values of an aesthetic. To change a mapping, add a new scale. | | Generate subplots | `facet_` | Facets divide a plot into subplots based on the values of one or more discrete variables. | | Alter the plots theme | `theme_` | Change the aesthetics of the plot background and feature (e.g. axes, text, grid lines, etc.) | --- ### Exporting Plots Note that `ggplot` objects can assigned to an object. ```r my_plot <- ggplot(cars,aes(speed,dist)) + geom_point() my_plot ``` <img src="lecture-week-05_data-visualization_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> -- We can export (or build off of) these plot objects using `ggsave()` ```r ggsave(plot = my_plot,filename = "my_plot.pdf",device = "pdf",width=5,height = 5) ggsave(plot = my_plot,filename = "my_plot.png",device = "png",dpi = 300) ``` > Supports "eps", "ps", "tex" (pictex), "pdf", "jpeg", "tiff", "png", "bmp", "svg" or "wmf" (windows only).