|1||January 26||Work Flow and Reproducibility|
|2||February 2||Introduction to Programming in R|
|3||February 9||Reproducibility in Practice|
|4||February 16||Data Wrangling in R||Problem Set 1 Assigned|
|5||February 23||Data Visualization||Problem Set 1 Due|
|6||March 2||Web Scraping||Problem Set 2 Assigned|
|7||March 9||Geospatial Data||Problem Set 2 Due|
|8||March 16||Text as Data||Problem Set 3 Assigned|
|9||March 23||Introduction to Statistical Learning||Problem Set 3 Due|
||March 30||Spring Break; No class|
|10||April 6||Applications in Supervised Learning (Regression)||Problem Set 4 Assigned|
|11||April 13||Applications in Supervised Learning (Classification)||Project Proposal Due; Problem Set 4 Due|
|12||April 20||Interpretable Machine Learning||Problem Set 5 Assigned|
|13||April 27||Applications in Unsupervised Learning||Problem Set 5 Due|
|14||May 4||Project Presentations|
|Final||May 18||Final Project Due (9:00 PM)|
Recurrent Zoom link can also be found on Canvas. Please contact the professor/TA through Slack if any of the Zoom links break.
Data science is an applied field and therefore, it is important that you understand how to conduct a complete analysis from collecting data, to cleaning and analyzing it, to presenting your findings. Toward this end, students are required to complete an independent data science project, applying concepts learned throughout the course. The project is composed of three parts: a project proposal, an in-class presentation, and a project report.
More information regarding the final project will be circulated during class on Week 8
Examples of Successful Projects
The following are installation instructions for
R from CRAN via the following:
RStudio, download from the following (scroll to the bottom):
Asynchronous lecture materials will go live approximately one week prior to the scheduled synchronous meeting date.
Third Party Visualization Packages that build off of
ggthemes: for great theme and color schemes.
scico: for beautiful continuous color schemes.
gghighlight: for highlighting values or subgroups in a ggplot2 plot.
ggrepel: for adding text labels to
GGally: for generating standard (but more complicated to construct)
ggplot2plots, such as correlation heatmaps and pairs plots.
ggmapis an R package that makes it easy to retrieve raster map tiles from popular online mapping services like Google Maps and Stamen Maps and plot them using the ggplot2 framework
tidycensusis an R package that allows users to interface with the US Census Bureau’s decennial Census and five-year American Community APIs and return tidyverse-ready data frames, optionally with simple feature geometry included.
Table of Contents