Week | Date | Topic | Assignment |
---|---|---|---|
1 | January 26 | Work Flow and Reproducibility | |
2 | February 2 | Introduction to Programming in R | |
3 | February 9 | Reproducibility in Practice | |
4 | February 16 | Data Wrangling in R | Problem Set 1 Assigned |
5 | February 23 | Data Visualization | Problem Set 1 Due |
6 | March 2 | Web Scraping | Problem Set 2 Assigned |
7 | March 9 | Geospatial Data | Problem Set 2 Due |
8 | March 16 | Text as Data | Problem Set 3 Assigned |
9 | March 23 | Introduction to Statistical Learning | Problem Set 3 Due |
|
March 30 | Spring Break; No class | |
10 | April 6 | Applications in Supervised Learning (Regression) | Problem Set 4 Assigned |
11 | April 13 | Applications in Supervised Learning (Classification) | Project Proposal Due; Problem Set 4 Due |
12 | April 20 | Interpretable Machine Learning | Problem Set 5 Assigned |
13 | April 27 | Applications in Unsupervised Learning | Problem Set 5 Due |
14 | May 4 | Project Presentations | |
Final | May 18 | Final Project Due (9:00 PM) |
Recurrent Zoom link can also be found on Canvas. Please contact the professor/TA through Slack if any of the Zoom links break.
Data science is an applied field and therefore, it is important that you understand how to conduct a complete analysis from collecting data, to cleaning and analyzing it, to presenting your findings. Toward this end, students are required to complete an independent data science project, applying concepts learned throughout the course. The project is composed of three parts: a project proposal, an in-class presentation, and a project report.
More information regarding the final project will be circulated during class on Week 8
Overview
Examples of Successful Projects
The following are installation instructions for R
and RStudio
.
R
Software
To install R
, download R
from CRAN via the following:
To install RStudio
, download from the following (scroll to the bottom):
Video walkthroughs:
Asynchronous lecture materials will go live approximately one week prior to the scheduled synchronous meeting date.
R
Practice
Practice:
.Rmd
) Practice:
Practice:
.rmd
as a .zip
)Third Party Visualization Packages that build off of ggplot2
.
ggthemes
: for great theme and color schemes.scico
: for beautiful continuous color schemes.gghighlight
: for highlighting values or subgroups in a ggplot2 plot.ggrepel
: for adding text labels to ggplot2
plots.GGally
: for generating standard (but more complicated to construct) ggplot2
plots, such as correlation heatmaps and pairs plots.Discussion:
Practice:
Practice:
Discussion:
R
:
ggmap
: ggmap
is an R package that makes it easy to retrieve raster map tiles from popular online mapping services like Google Maps and Stamen Maps and plot them using the ggplot2 frameworkleaflet
: leaflet
is one of the most popular open-source JavaScript libraries for interactive maps. It’s used by websites ranging from The New York Times and The Washington Post to GitHub and Flickr, as well as GIS specialists like OpenStreetMap, Mapbox, and CartoDB.tidycensus
: tidycensus
is an R package that allows users to interface with the US Census Bureau’s decennial Census and five-year American Community APIs and return tidyverse-ready data frames, optionally with simple feature geometry included.Practice:
Table of Contents
Practice:
recipes
Practice:
caret
Practice:
caret
Practice:
Practice: