PPOL564 - Data Science I: Foundations

Lecture 6

Comprehensions, Generators and Itertools

Plan for Today

  • Talk about comprehensions and how to effectively use them.
  • Generators
  • Itertools
  • File management (see other notebook)

Comprehensions

Provide a readable and effective way of performing a particular expression on a iterable series of items.

The general form of the comprehension:

See here for more details.

List Comprehensions

Using the list literals [] (brackets), we construct a for loop from within.

In [18]:
words = "This is a such a long course".split()
words
Out[18]:
['This', 'is', 'a', 'such', 'a', 'long', 'course']
In [19]:
[len(word) for word in words]
Out[19]:
[4, 2, 1, 4, 1, 4, 6]

List comprehensions are a tool for transforming one list (or any container object in Python3) into another list. This is a syntactic work around for the long standing filter() and map() functions in python.

Set Comprehensions

(New to Python 3)

Using the set literals {}, we construct a for loop from within.

Recall the difference between a set and dictionary is whether there is a key:value pair within the curly brackets. When there is a key:value pair within the brackets, it's a dictionary. When there are only values, it's a set. Use type() if you are unsure.

In [20]:
{len(word) for word in words}
Out[20]:
{1, 2, 4, 6}

Dictionary Comprehensions

(New to Python 3)

Using the set literals {} and assigning a key value pair {key : value}, we construct a for loop from within.

In [21]:
# Create two lists: one full of values and another of equal length full of keys.
list_of_values = [1,2,3,4,5]
list_of_keys = ['a','b','c','d','e']
length_of_the_lists = len(list_of_values)

{list_of_keys[i]:list_of_values[i] for i in range(length_of_the_lists)}
Out[21]:
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}

if statements in comprehensions

In [22]:
# Quickly produce a series of numbers
[i for i in range(10)]
Out[22]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [23]:
[i for i in range(10) if i > 5 ]
Out[23]:
[6, 7, 8, 9]

else statements aren't valid in a comprehension, so the code statement needs to be kept simple.

In [24]:
[i  for i in range(10) if i > 5 else "hello"]
  File "<ipython-input-24-27f752950cb6>", line 1
    [i  for i in range(10) if i > 5 else "hello"]
                                       ^
SyntaxError: invalid syntax

Conditional Expressions

Concise if-then statements

<this_thing> if <this_is_true> else <this_other_thing>
In [41]:
x = 4
"Yes" if x > 5 else "No"
Out[41]:
'No'
In [42]:
x = 6
"Yes" if x > 5 else "No"
Out[42]:
'Yes'
In [43]:
["Yes" if x > 5 else "No" for x in range(10)]
Out[43]:
['No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes']

Nested comprehensions

In [25]:
[i for i in range(5)]
Out[25]:
[0, 1, 2, 3, 4]
In [26]:
[j for j in range(-5,0)]
Out[26]:
[-5, -4, -3, -2, -1]
In [27]:
[[i,j] for i in range(5) for j in range(-5,0)]
Out[27]:
[[0, -5],
 [0, -4],
 [0, -3],
 [0, -2],
 [0, -1],
 [1, -5],
 [1, -4],
 [1, -3],
 [1, -2],
 [1, -1],
 [2, -5],
 [2, -4],
 [2, -3],
 [2, -2],
 [2, -1],
 [3, -5],
 [3, -4],
 [3, -3],
 [3, -2],
 [3, -1],
 [4, -5],
 [4, -4],
 [4, -3],
 [4, -2],
 [4, -1]]

Speed Boost

Comprehensions not only make our code more concise, they also increase the speed of our code

In [28]:
%%timeit
container = []
for i in range(1000):
    container.append(i)
89.3 µs ± 1.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [29]:
%%timeit
container = [i for i in range(1000)]
38.9 µs ± 382 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The comprehension expression takes roughly half the time!

Internal Scope

In [30]:
# Say we create a string object letter
letter = 'z'

# Then we use letter as a placeholder
letters = ['a','b','c','d']
for letter in letters:
    print(letter)
a
b
c
d
In [31]:
letter # Letter got over written!!!
Out[31]:
'd'

Now let's do the same thing with a comprehension

In [32]:
letter = 'z'
[letter for letter in letters]
Out[32]:
['a', 'b', 'c', 'd']
In [33]:
letter
Out[33]:
'z'

What this means is the list comprehensions offer us more consistency and generate less issues when we arbitrarily assign named values for placeholders when using for loops


Generators

  • Specify an iterable sequence that you evaluate lazily (compute on demand).
  • All generators are iterators
  • can model infinite sequences (such as data streams with no definite end)

Generators are similar to functions; however, rather than use the return keyword, we leverage the yield keyword. If you use the yield keyword once in a function, then that function is a generator.

In [1]:
def gen123():
    yield 1
    yield 2
    yield 3
In [2]:
gen123
Out[2]:
<function __main__.gen123()>
In [9]:
g = gen123() # initiate in an object
g
Out[9]:
<generator object gen123 at 0x1038e6e58>

Behaves just like an iterator; however, the next thing being demanded isn't the next item, but rather the next computation

In [10]:
next(g)
Out[10]:
1
In [11]:
next(g)
Out[11]:
2
In [12]:
next(g)
Out[12]:
3
In [13]:
next(g)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-13-e734f8aca5ac> in <module>()
----> 1 next(g)

StopIteration: 

Note that each call to a generator returns a new generator object.

In [16]:
h = gen123()
i = gen123()
h is i
Out[16]:
False

Generator Comprehensions

  • simialr syntax to list comprehensions
  • creates a generator object
  • concise
  • lazy evaluation
    (expr(item) for item in iterable)
In [1]:
(i for i in range(10))
Out[1]:
<generator object <genexpr> at 0x1060efb88>
In [2]:
list((i for i in range(10)))
Out[2]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [3]:
sum(i for i in range(10))
Out[3]:
45

This is really useful if we want to calculate values on demand rather than loading an entire series into memory.

In [4]:
# from 1 to 100,000, how many values are divisible by 13?
sum(1 for i in range(100000) if i%13)
Out[4]:
92307

Itertools

Part of the python standard library. Itertools deals with pythons iterator objects. This provides a robust functionaliy for iterable sequences. Functions in itertools operate on iterators to produce more complex iterators.

We saw two last time when discussing lambda functions: filter() and map().

Some iteration tools produce scalar values, other produce other iterable objects.

any()

In [6]:
any(name == name.title() for name in ["London","New York","Russia",'cat','bus'])
Out[6]:
True
In [11]:
any(len(name) > 10 for name in ["London","New York","Russia",""])
Out[11]:
False

all()

In [12]:
all(name == name.title() for name in ["London","New York","Russia",'cat','bus'])
Out[12]:
False
In [14]:
all(len(name) > 0 for name in ["London","New York","Russia",""])
Out[14]:
False

sum()

In [20]:
sum(i for i in range(100))
Out[20]:
4950

min()

In [22]:
min(i for i in range(100,1000) if i % 17 == 0)
Out[22]:
102

max()

In [23]:
max(i for i in range(100,1000) if i % 17 == 0)
Out[23]:
986

zip()

syncs two series of numbers up into tuples.

In [15]:
a = list(range(10))
b = list(range(-10,0))
zip(a,b) # It's own object type
Out[15]:
<zip at 0x106152748>
In [16]:
dir(zip(a,b))
Out[16]:
['__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__ne__',
 '__new__',
 '__next__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__']
In [19]:
[item for item in zip(a,b)]
Out[19]:
[(0, -10),
 (1, -9),
 (2, -8),
 (3, -7),
 (4, -6),
 (5, -5),
 (6, -4),
 (7, -3),
 (8, -2),
 (9, -1)]

enumerate()

Generates an index and value tuple pairing

In [29]:
my_list = 'Iterator tools are useful to move across iterable objects in complex ways.'.split()
enumerate(my_list)
Out[29]:
<enumerate at 0x1061ad3f0>
In [30]:
[i for i in enumerate(my_list)]
Out[30]:
[(0, 'Iterator'),
 (1, 'tools'),
 (2, 'are'),
 (3, 'useful'),
 (4, 'to'),
 (5, 'move'),
 (6, 'across'),
 (7, 'iterable'),
 (8, 'objects'),
 (9, 'in'),
 (10, 'complex'),
 (11, 'ways.')]

itertools module

In [3]:
import itertools

.combinations()

Permutations of all potential combinations

In [48]:
x = ['a','b','c','d']
[i for i in itertools.combinations(x,2)]
Out[48]:
[('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
In [53]:
# Note that we can also unpack an iterable with a constructor
list(itertools.combinations(x,2))
Out[53]:
[('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
In [55]:
# combinations with replacement
list(itertools.combinations_with_replacement(x,4))
Out[55]:
[('a', 'a', 'a', 'a'),
 ('a', 'a', 'a', 'b'),
 ('a', 'a', 'a', 'c'),
 ('a', 'a', 'a', 'd'),
 ('a', 'a', 'b', 'b'),
 ('a', 'a', 'b', 'c'),
 ('a', 'a', 'b', 'd'),
 ('a', 'a', 'c', 'c'),
 ('a', 'a', 'c', 'd'),
 ('a', 'a', 'd', 'd'),
 ('a', 'b', 'b', 'b'),
 ('a', 'b', 'b', 'c'),
 ('a', 'b', 'b', 'd'),
 ('a', 'b', 'c', 'c'),
 ('a', 'b', 'c', 'd'),
 ('a', 'b', 'd', 'd'),
 ('a', 'c', 'c', 'c'),
 ('a', 'c', 'c', 'd'),
 ('a', 'c', 'd', 'd'),
 ('a', 'd', 'd', 'd'),
 ('b', 'b', 'b', 'b'),
 ('b', 'b', 'b', 'c'),
 ('b', 'b', 'b', 'd'),
 ('b', 'b', 'c', 'c'),
 ('b', 'b', 'c', 'd'),
 ('b', 'b', 'd', 'd'),
 ('b', 'c', 'c', 'c'),
 ('b', 'c', 'c', 'd'),
 ('b', 'c', 'd', 'd'),
 ('b', 'd', 'd', 'd'),
 ('c', 'c', 'c', 'c'),
 ('c', 'c', 'c', 'd'),
 ('c', 'c', 'd', 'd'),
 ('c', 'd', 'd', 'd'),
 ('d', 'd', 'd', 'd')]

.permutations()

In [56]:
list(itertools.permutations(x))
Out[56]:
[('a', 'b', 'c', 'd'),
 ('a', 'b', 'd', 'c'),
 ('a', 'c', 'b', 'd'),
 ('a', 'c', 'd', 'b'),
 ('a', 'd', 'b', 'c'),
 ('a', 'd', 'c', 'b'),
 ('b', 'a', 'c', 'd'),
 ('b', 'a', 'd', 'c'),
 ('b', 'c', 'a', 'd'),
 ('b', 'c', 'd', 'a'),
 ('b', 'd', 'a', 'c'),
 ('b', 'd', 'c', 'a'),
 ('c', 'a', 'b', 'd'),
 ('c', 'a', 'd', 'b'),
 ('c', 'b', 'a', 'd'),
 ('c', 'b', 'd', 'a'),
 ('c', 'd', 'a', 'b'),
 ('c', 'd', 'b', 'a'),
 ('d', 'a', 'b', 'c'),
 ('d', 'a', 'c', 'b'),
 ('d', 'b', 'a', 'c'),
 ('d', 'b', 'c', 'a'),
 ('d', 'c', 'a', 'b'),
 ('d', 'c', 'b', 'a')]

.count()

Creates a count generator.

In [5]:
counter = itertools.count(start=0,step=.3)
In [6]:
next(counter)
Out[6]:
0
In [7]:
next(counter)
Out[7]:
0.3
In [8]:
next(counter)
Out[8]:
0.6
In [10]:
list(zip(itertools.count(step=5),"Georgetown"))
Out[10]:
[(0, 'G'),
 (5, 'e'),
 (10, 'o'),
 (15, 'r'),
 (20, 'g'),
 (25, 'e'),
 (30, 't'),
 (35, 'o'),
 (40, 'w'),
 (45, 'n')]

.repeat()

In [15]:
list(itertools.repeat("a",10))
Out[15]:
['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a']

.chain()

lazily concatenate lists together without the memory overhead of duplication.

In [17]:
list(itertools.chain('ABC', 'DEF'))
Out[17]:
['A', 'B', 'C', 'D', 'E', 'F']

.islice()

Slices like we would normally do on a list, but does so as an iterator.

In [23]:
[i for i in itertools.islice('abcd',2)]
Out[23]:
['a', 'b']

These are but a few! Check out all that itertools has to offer here