PPOL564 - Data Science I: Foundations

Using Jupyter Notebooks, Magic Commands, & Extensions

What is a Notebook and why use it?

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain

  • live code,
  • equations,
  • visualizations and
  • narrative text.

Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

Pros:

  • Notebooks are ubiquitous,
  • Reproducible: transmitting and conveying results
  • We can build code interactively (like we do in R). This makes Jupyter notebooks particularly friendly when you're first learning Python
    • This also makes Atom + Hydrogen and Spyder equally useful.
  • stable

Cons:

  • Non-linear: sometimes we can fall out of sequence when writing code. E.g. write code dependencies after we first need to use them.
  • There is a process to spinning a Notebook up.

.ipynb is really a JSON file

At it's core, an Jupyter notebook is a JSON (JavaScript Object Notation) file. Let's see what the notebook that we are currently using looks like:

Initializing a Notebook

There are two primary methods for initializing a notebook.

  1. Via the command line
    • Go into the working directory containing your .ipynb notebook.
      • e.g. cd /Users/me/Desktop/
    • type jupyter notebook
    • the web application will open up in your default browser.
    • from there, click on the notebook and "spin it up". The notebook will then be "running".
    • We can close the notebook by clicking on the Quit and Logout buttons on the page.
      • Quit == close the local server (i.e. the web application connection)
      • Logout == shut down the home page of the web application (but keep the server running)
    • We can also close the server connection in the console using the combo of Control-C in the console.
    • We can also relocate the the server (say if we accidentally close the Notebook) by using the local URL pathway provided when the notebook first activates.
  1. Via the Anaconda Navigator (requires you installing an Anaconda distribution)

    • Click on the Anaconda Icon
    • Click "Launch" on the jupyter notebook icon.
    • The web application will immediately fire up (also yielding a console panel much like what we say via the command line approach).
    • One issue is that your working directory (i.e. where the notebook thinks you are on your computer) will be where ever Anaconda is stored (for me, it's at the very top of my file directory). Housing your projects here can be suboptimal for a whole range of reasons, so we'll need to change the working directory to the actual location that we want.
      • One benefit of spinning up a Jupyter notebook via the command line is that your working directory will always be where you initialized the notebook.

Kernels

A kernel is a computational engine that executes the code contained in a notebook document. A cell (or "Chunk") is a container for text to be displayed in the notebook or code to be executed by the notebook's kernel.

Though we can only have one type of kernel running for any given notebook (we can't change between kernels in the middle of a notebook), we can use jupyter beyond just a python kernel. Here is a list of all the kernels that you can use with a jupyter notebook. For example, we can easily employ an R kernel in a jupyter notebook. This was always the notebooks original intent. Actually, "Jupyter" is a loose acronym meaning Julia, Python and R


Usage

Code Chunks

Code chunks are what we use to execute Python (or whatever kernel we have running) code. In addition, we can write prose in a code chunk by altering the metadata regarding how the code should be run.

There are two states of a code chunk:

  • Edit Mode: Edit mode is indicated by a green cell border and a prompt showing in the editor area. When a cell is in edit mode, you can type into the cell, like a normal text editor. Enter edit mode by pressing Enter or using the mouse to click on a cell's editor area.
  • Command Mode: Command mode is indicated by a grey cell border with a blue left margin. When you are in command mode, you are able to edit the notebook as a whole, but not type into individual cells. Most importantly, in command mode, the keyboard is mapped to a set of shortcuts that let you perform notebook and cell actions efficiently. For example, if you are in command mode and you press c, you will copy the current cell - no modifier is needed. Don't try to type into a cell in command mode. Enter command mode by pressing Esc or using the mouse to click outside a cell's editor area.

We can switch between Markdown and Code chunks either

  • By using the drop down menu in the tool bar (in either mode)
  • By using the shortcut:
    • Press y when on the cell in Command Mode to switch to a code chunk.
    • Press m when on the cell in Command Mode to switch to a markdown chunk

Executing Code

A code chunk will always reflect the behavior of the kernel that you're using (e.g. a Python code chunk will follow Python coding Syntax).

Best Practices

  • Break code chunks up!
  • Every code chunk should render some output (the aim is to be able to read what we were doing without needing to fire the notebook back up)
  • Use spaces. Keep the chunk readable. Less is more.

Using Markdown

The Markdown chunks will use the Markdown and will allow for writing mathematical equations using LaTex.

Shortcuts

As with most user interfaces, Jupyter Notebooks have developed their own way of doing things. Thus there are a number of useful shortcuts that you can employ to help perform useful tasks.

We can access a full (searchable) list of keyboard shortcuts by pressing p when in Command Mode, or by clicking the keyboard icon in the tools.

Important ones while in Command Mode:

  • a: create a new code chunk above the current one.
  • b: create a new code chunk below the current one.
  • ii: interrupt the kernel (really useful when some code is running too long or you've accidentally initiated an infinite loop!
  • y: code mode
  • m: markdown mode
  • shift + m: merge cells (when more than one cell is highlighted)

Important ones while in Edit Mode:

  • shit + ctrl + minus: split cell

Magic Commands

Magic commands, and are prefixed by the % character. These magic commands are designed to succinctly solve various common problems in standard data analysis.

Magic commands come in two flavors:

  • line magics, which are denoted by a single % prefix and operate on a single line of input,
  • cell magics, which are denoted by a double %% prefix and operate on multiple lines of input.

List off all the available magic commands.

In [7]:
%lsmagic
Out[7]:
Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

Or consult the quick reference sheet of all available magic

In [3]:
%quickref

Useful Magic

Here are some useful magic commands that come in handy as you're working with code.

Bookmarking

"Come back here later"

In [7]:
%bookmark Home

See below

Changing working directories

In [4]:
%cd ~/Desktop
/Users/edunford/Desktop
In [5]:
%pwd
Out[5]:
'/Users/edunford/Desktop'

Using the bookmark to return to where we were...

In [6]:
%cd -b Home
UsageError: Bookmark 'Home' not found.  Use '%bookmark -l' to see your bookmarks.
In [13]:
%pwd
Out[13]:
'/Users/ericdunford/Dropbox/Georgetown/Courses/PPOL564-Foundations/lectures/lecture_03'

Writing code to files

Extremely useful when we develop some functionality that we'd like to utilize later on.

In [8]:
%%writefile my_fib_func.py
def fib(n):
    '''Fibonacci Sequence'''
    x = [0]*n
    for i in range(n):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
        else:
            x[i] = x[i-2] + x[i-1]
    return x
Writing my_fib_func.py
In [15]:
%ls # list files ( see our function)
lecture_03-using-jupyter-notebooks.ipynb
my_fib_func.py

Reading in files

In [ ]:
# %load my_fib_func.py
def fib(n):
    '''Fibonacci Sequence'''
    x = [0]*n
    for i in range(n):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
        else:
            x[i] = x[i-2] + x[i-1]
    return x

Run an external file as a program

In [10]:
%run my_fib_func.py

Timing Code

How fast does what we wrote run?

In [11]:
%time fib(10)
CPU times: user 12 µs, sys: 0 ns, total: 12 µs
Wall time: 16.9 µs
Out[11]:
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

How long does many runs take (statistical sample)?

In [12]:
%%timeit
fib(10)
2.66 µs ± 32.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Look up object names in the name space

In [13]:
main_dat = [1,2,3,4]
main_key = ["a","b"]
x = 5
y = 6
In [14]:
%psearch main*

Whenever you encounter an error or exception, just open a new notebook cell, type %debug and run the cell. This will open a command line where you can test your code and inspect all variables right up to the line that threw the error. Type n and hit Enter to run the next line of code (The -> arrow shows you the current position). Use c to continue until the next breakpoint. q quits the debugger and code execution.

Asking for help

In [15]:
%%timeit?

Notebook Extensions

We can expand the functionality of Jupyter notebooks through extensions. Extensions allow for use to create and use new features that better customize the notebook's user experience. For example, there are extensions for spell check, a table of contents to ease navigation, run code in parallel, and for viewing differences in notebooks when using Version control.

Download python module to install notebook extensions: https://github.com/ipython-contrib/jupyter_contrib_nbextensions

Using PyPi (module manager):

pip install jupyter_nbextensions_configurator jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextensions_configurator enable --user

Using Conda (Anaconda module manager):

conda install -c conda-forge jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextensions_configurator enable --user

Extensions can be activated most easily on the home screen when you first activate your Jupyter notebook.

Useful Extensions

  • Collapsible headings: allows you to collapse some parts of the notebooks.
  • Notify: sends a notification when the notebook becomes idle (for long running tasks)
  • Code folding: folds function, loops, and indented code chunks (makes things tidy)
  • nbdime: provides tools for git differencing and merging of Jupyter Notebooks.
    • Requires installation: pip install nbdime