Data Science I: Foundations
PPOL564-01
Fall 2021
Georgetown University
Week | Date | Topic | Assignment | Coding Discussion |
---|---|---|---|---|
1 | 1-Sep | Introductions, Installations, and IDEs | ||
2 | 8-Sep | Version Control, Workflow, and Reproducibility | X | |
3 | 15-Sep | Object-Oriented Programming in Python | X | |
4 | 22-Sep | Introduction to Algorithms | Assignment 1 Assigned | |
5 | 29-Sep | From Nested Lists to Data Frames | Assignment 1 Due | |
6 | 6-Oct | Approaches to Data Manipulation in Python | X | |
7 | 13-Oct | Data Visualization and Exploration | Assignment 2 Assigned | |
8 | 20-Oct | Drawing from (Un-)Structured Data Sources | Assignment 2 Due; Assignment 3 Assigned | |
9 | 27-Oct | Introduction to Statistical Learning | Assignment 3 Due | |
10 | 3-Nov | Continuous Outcomes and Linear Regression | Project Proposals Due | X |
11 | 10-Nov | Probability, Bayes Theorem, and Classification | X | |
12 | 17-Nov | Algorithmic Approaches to Supervised Learning | X | |
13 | 24-Nov | Interpretable Machine Learning | Assignment 4 Assigned | |
14 | 1-Dec | Project Presentations | Assignment 4 Due |
Recurrent Zoom link can also be found on Canvas. If the link breaks or does not function properly, please check the #general
channel on Slack for information regarding the new link. If there is no message regarding a new link, please contact the professor and/or TA via Slack. All synchronous lecture material will be recorded.
Throughout the semester, the instructor will use the commandline and many different IDEs when coding in Python or using Git. The following lists those different software and provides guidance on installation. If you run into issues, please reach out to the Teaching Assistant for assistance.
At times, we’ll use a unix-based commandline. The commandline will feature into our discussion on using git
and also running Python programs. If you use a Mac or a Linux operating system, then a functioning commandline comes with your operating system. For Apple machines, this is the Terminal.
For Windows (specifically Windows 10), you can enable Linux Bash shell. The following offers a tutorial on how to do this. If you’re using a version of Windows that pre-dates version 10, then Git Bash offers a program will allow you to use git
commands from your windows machine.
Finally, you’ll notice that my terminal will have a slightly different look than the one on your machine. This is because I’m using “Oh My Zsh” which is open-source software that allows me to customize my commandline. The above link offers everything you’d need to installing Oh My Zsh on your machine.
We’ll use Python3 throughout this course. Below are instructions for downloading Python3 using commandline packages manager (Homebrew
for mac, Chocolatey
for windows).
An alternative way to install Python3 is to download an Anaconda distribution. The instructor will use pip
rather than conda
in the instruction for downloading Python modules. These are simply two ways of downloading and managing open-source software packages. Choose which ever works best for you.
Once you have Python3 on your computer, you can install a Jupyter Notebook. If you downloaded Python3 using Anaconda, then Jupyter Notebook comes with the distribution and requires no further installation on your part. If you install Python3 using Homebrew
/Chocolately
, you can install Jupyter notebook running the following code using your commandline.
pip install notebook
You can then activate a Jupyter Notebook from the commandline by typing:
jupyter notebook
If you’ve installed Python using Anaconda, the distribution provides a click-able icon to fire up a Jupyter Notebook. The advantage of using the commandline, however, is that you can set the working directory prior to firing up a notebook. This will allow you to work within a specific project folder more easily.
hydrogen
Atom is a hack-able text editor built by Github. The following are instruction on how to install Atom on your machine.
Atom allows you to install open-source packages that provide additional functionality. The following packages will help you as you use Atom to program in Python. Of these, Hydrogen
is the most important. It’ll allow you to use a Jupyter kernel from within Atom to evaluate code.
Hydrogen@2.16.3
Zen@0.18.0
advanced-open-file@0.16.8
atom-beautify@0.33.4
atom-clock@0.1.18
atom-html-preview@0.2.6
atom-language-r@1.4.8
atom-material-syntax@1.0.8
atom-material-syntax-light@0.4.6
atom-material-ui@2.1.3
atom-path-intellisense@1.2.2
atom-python-virtualenv@1.0.4
atom-todoist@2.0.0
auto-update-packages@1.0.1
autocomplete-R@0.6.0
autocomplete-latex-cite@0.3.5
autocomplete-modules@2.3.0
autocomplete-python@1.17.0
autocomplete-sql@0.5.0
browser-plus@0.0.98
color-picker@2.3.0
data-explorer@0.7.0
docblock-python@0.19.1
file-icons@2.1.47
fix-indent-on-paste@0.1.1
fold-comments@0.6.0
git-log@0.4.1
hey-pane@1.2.0
hydrogen-cell-separator@0.4.1
indent-guide-improved@1.4.13
jupyter-notebook@0.0.10
kite@0.206.0
language-latex@1.2.0
language-weave@0.7.2
latex@0.50.2
latex-tree@0.5.0
latexer@0.3.0
minimap@4.39.14
oceanic-next@1.0.0
pdf-view@0.73.0
platformio-ide-terminal@2.10.1
project-manager@3.3.8
python-indent@1.2.6
quick-query-sqlite@0.4.1
reindent@1.5.0
scroll-through-time@0.3.1
simple-drag-drop-text@0.5.0
symbols-tree-view@0.14.0
todo-show@2.3.2
typewriter@0.8.0
wordcount@3.2.0
To install any one of these packages from the commandline, type:
# apm == "Atom package manager"
apm install <package-name>
# For example
apm install Hydrogen@2.16.3
There is also a dedicated package manager built into Atom which you can use to download and install new packages. Open Atom then Settings > Install
and type the package name.
Several students have had issues arise in getting Hydrogen to properly run on their machines. Particularly, after following the installation instructions for Atom and Hydrogen, many people find that upon trying to run Python code, they either (1) receive an error message stating that "no kernel for language Python found"
(or something similar), or (2) they are able to connect to a Python kernel but upon trying to run code, nothing happens (they may or may not receive error messages associated with that.
If you encounter this issue, we suggest trying the following solutions in order until one of the solutions works. If you have tried all three possible solutions and are still not able to properly run Python code in Hydrogen/Atom, please contact the teaching assistants (either by Slack, email, or setting up a Calendly appointment).
Solution 1
Open the command line and run the following two commands:
python3 -m pip install ipykernel
python3 -m ipykernel install --user
Then restart Atom and try running Python code.
Solution 2
Uninstall Hydrogen on Atom by opening Atom, click "Install a Package"
, and search for Hydrogen in the search bar. Click "Uninstall"
. Once Hydrogen has finished uninstalling, search Hydrogen again and hit "Install"
. Once Hydrogen has finished re-installing, restart Atom and try running Python code.
Solution 3
Add the following paths to your list of environmental variables using the command line. Note that exact file paths will need to be adjusted slightly depending on your machine and operating system.
C:\Anaconda3
C:\Anaconda3\Scripts
C:\Anaconda3\Library\bin
Once these have been added to the list of environmental variables, restart Atom and try running Python code.
reticulate
In your classes that are focused on using R
, RStudio
will be your main IDE. However, RStudio
isn’t just for R
. It can handle a number of different languages. We can use Python in RStudio
using the reticulate
package. We’ll talk about some of the advantages for doing this in class, but for now, let’s cover installation.
To install RStudio
, download from the following link (make sure to scroll all the way to the bottom).
reticulate
is an R package that allows one run a Python REPL in the R console. In addition, it allows one to read in and use Python code, and pass data between R and and Python. The following provides instructions on installing reticulate
.
Note: If you have multiple versions of Python on your computer, reticulate can get confused with regard to which version it is referencing. The following article covers these issues. The best way to resolve this issue is by creating a .Rprofile
file that sends instructions regarding the specific version of Python you wish to use.
“By setting the value of the RETICULATE_PYTHON environment variable to a Python binary. Note that if you set this environment variable, then the specified version of Python will always be used (i.e. this is prescriptive rather than advisory). To set the value of RETICULATE_PYTHON, insert Sys.setenv(RETICULATE_PYTHON = PATH) into your project’s .Rprofile, where PATH is your preferred Python binary.”
Here is an overview of other text editors that are popular for programming in Python, which you won’t see featured in this course. Note I’m agnostic on whatever you use to learn Python and some find that different set ups work better for them. If one of these setups works better for you, I encourage you to use it (and tell me about how it went)!
Data science is an applied field and therefore, it is important that you understand how to conduct a complete analysis from collecting data, to cleaning and analyzing it, to presenting your findings. Toward this end, students are required to complete an independent data science project, applying concepts learned throughout the course. The project is composed of three parts: a 2 page project proposal, an in-class presentation, and a 12-page project report.
More information regarding the final project will be circulated during class on Week 8
Virtual Classroom
tab for link)The best way to reach the TA/Professor is via the class Slack channel (PPOL-564-Fall-2021). Please click on the Class Slack Channel Invite to join the class work-space.
Asynchronous lecture materials will go live approximately one week prior to the scheduled synchronous meeting date.