class: center, middle, inverse, title-slide #
PPOL564 | Data Science 1 | Foundations
Week 7
Data Visualization
###
Prof. Eric Dunford ◆ Georgetown University ◆ McCourt School of Public Policy ◆
eric.dunford@georgetown.edu
--- layout: true <div class="slide-footer"><span> PPOL564 | Data Science 1 | Foundations           Week 7 <!-- Week of the Footer Here -->              Data Visualization <!-- Title of the lecture here --> </span></div> --- class: newsection # The Components of Data Visualization --- ### What do you see? <img src="data-visualization_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> --- ### Mapping data to space .center[<img src="Figures/cartesian-coord-1.png", width=600>] --- ### Aesthetics <br> .center[<img src="Figures/common-aesthetics-1.png", width=500>] <br><br> .center[<img src="Figures/basic-scales-example-1.png", width=500>] --- ### Color as a tool to distinguish .center[<img src="Figures/breakdown-by-year_25km-1day-window.png", width=600>] --- ### Color as a tool to represent values .center[<img src="Figures/violent-events-2016_by-adm-unit_by-dataset.png", width=100%>] --- ### Color as a tool to highlight <img src="data-visualization_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- ### Presentation as information <img src="data-visualization_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> -- <img src="data-visualization_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- ### Presentation as distortion <img src="data-visualization_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> -- <img src="data-visualization_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- ### The data type drives the visualization decisions Think carefully about what you're trying to convey and what information you're using to make your point. <br> .center[ | Data Type | Example | Scale | |-------------|---------|---------| | Numerical | `1.3`, `800`, `10e3` | Continuous | | Integer | `1`, `2`, `3` | Discrete | | Categorical | `"dog"`, `"Nigeria"`, `"A"` | Discrete | | Ordered | `"Small"`, `"Medium"`, `"Large"` | Discrete | | Dates/Time | `2009-01-02`, `5:32:33` | Continuous | ] --- ### The data type drives the visualization decisions .center[ .center[<img src="Figures/amounts-1.png", width=500>] .center[<img src="Figures/proportions-1.png", width=500>] .center[<img src="Figures/single-distributions-1.png", width=500>] .center[<img src="Figures/basic-scatter-1.png", width=500>] ] --- class:newsection # Grammar of Graphics --- .pull-left[.center[<img src = "Figures/ggplot-hex.png"><br><br>≈<br><br><img src = "Figures/plotnine_logo.png">]] .pull-right[ `plotnine` is an emulator for the powerfu `ggplot2` graphics package in `R`. `ggplot2` offers a flexible and intuitive graphics language capable of building sophisticated graphics. <br><br> `plotnine`/`ggplot2` has a **special syntax** that we'll have to get used to, _but_ once we understand the basics, we'll be able to produce some advanced and sophisticated graphics with ease! ] --- .pull-left[.center[<img src = "Figures/ggplot-hex.png"><br><br>≈<br><br><img src = "Figures/plotnine_logo.png">]] .pull-right[ `plotnine`/`ggplot2` is based on a **grammar of graphics**. In essence, you can build every graph from the same components that follow the same intuitive naming conventions. Every graph is composed of 1. a **dataset** 2. **coordinate system** 2. **mappings** → the variables we're aiming to visualize 3. **geom**etric expressions of how the data should be projected onto a space ] --- .center[ <font color = "green">`ggplot`</font>(data = `<DATA>`) `+` <font color = "green">`<GEOM_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) ] -- .center[ `+` <font color = "green">`<GEOM_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) `+` <font color = "green">`<GEOM_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) `+` <font color = "green">`<GEOM_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) `$$\vdots$$` ] --- .center[ <font color = "green">`ggplot`</font>(data = `<DATA>`) `+` <font color = "green">`<GEOM_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) ] .center[ `+` <font color = "red">`<SCALE_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) `+` <font color = "blue">`<THEME_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) `+` <font color = "orange">`<FACET_FUNCTION>`</font>(mapping = <font color = "green">aes</font>(`<MAPPINGS>`)) `$$\vdots$$` ] --- ### One variable? .center[ | Expression | Function | | |----|----|-----| | Area | `geom_area()` | <img src = "Figures/geom_area.png"> | | Density | `geom_density()` | <img src = "Figures/geom_density.png"> | | Dots | `geom_dotplot()` | <img src = "Figures/geom_dotplot.png"> | | Frequencies | `geom_freqpoly()` | <img src = "Figures/geom_freqplot.png"> | | Histogram | `geom_histogram()` | <img src = "Figures/geom_histogram.png"> | ] --- ### Two variables? .center[ | Expression | Function | | |----|----|-----| | Continuous Points | `geom_point()` | <img src = "Figures/geom_point.png"> | | Continous Lines | `geom_line()` | <img src = "Figures/geom_line.png"> | | Discrete Counts | `geom_count()` | <img src = "Figures/geom_count.png"> | | Continuous and Discrete Distributions | `geom_boxplot()` | <img src = "Figures/geom_boxplot.png"> | | Densities | `geom_hex()` | <img src = "Figures/geom_hex.png"> | ] --- ### Three variables? .center[ | Expression | Function | | |----|----|-----| | Densities | `geom_contour()` | <img src = "Figures/geom_contour.png"> | | Intensities | `geom_tile()` | <img src = "Figures/geom_tile.png"> | | Intensities | `geom_raster()` | <img src = "Figures/geom_raster.png"> | | Spatial | `geom_map()` | <img src = "Figures/geom_map.png"> | ] -- See [`plotnine`'s documentation website](https://plotnine.readthedocs.io/en/stable/index.html) for additional guidance and tips on using the API. --- ### Function Types in `ggplot2`/`plotnine` | Type | Function Header | Description | |------|-----------------|-------------| | Generate layers from data | `geom_` | Use a geom function to represent data points, use the geom’s aesthetic properties to represent variables. Each function returns a layer. | | Construct statistics layers | `stat_` | A stat builds new variables to plot (e.g., count, prop) | | Change mapping characteristics | `scale_` | Scales map data values to the visual values of an aesthetic. To change a mapping, add a new scale. | | Generate subplots | `facet_` | Facets divide a plot into subplots based on the values of one or more discrete variables. | | Alter the plots theme | `theme_` | Change the aesthetics of the plot background and feature (e.g. axes, text, grid lines, etc.) |