PPOL564 - Data Science I: Foundations

Lecture 4

Manipulating Data Structures

Plan for Today¶

Manipulating Mutable Data Structures.
See other notebook for discussion on data types.
See supplement notebook for a more detailed look at the functionality of strings and dates.

Manipulating Mutable Objects¶

Lists¶

country_list = ["Russia","Latvia","United States","Nigeria","Mexico","India","Costa Rica"]
country_list

['Russia',
 'Latvia',
 'United States',
 'Nigeria',
 'Mexico',
 'India',
 'Costa Rica']

`len()`¶

len() provides use with the length of the list..

print(len(country_list))
print(len(country_list[1]))

7
6

`.index()`¶

Isolating the index location of a specific value.

country_list.index('Nigeria')

3

country_list[country_list.index('Nigeria')]

'Nigeria'

Membership in a list using the in operator.

'Russia' in country_list

True

Appending and altering values¶

Adding values to a collection, we have seen methods such as __add__, .append(), .extend(), and .update() given the collection type.

Recall that not all methods actually update the object.

print(id(country_list)) # print object id
print(country_list + ['Canada']) # add canada to the list

140195653389960
['Russia', 'Latvia', 'United States', 'Nigeria', 'Mexico', 'India', 'Costa Rica', 'Canada']

print(id(country_list)) # object id remains consistent
print(country_list) # list wasn't updated

140195653389960
['Russia', 'Latvia', 'United States', 'Nigeria', 'Mexico', 'India', 'Costa Rica']

We need an in-place addition offered by the __iadd__ method with the literal +=

country_list += ['Canada']
country_list

['Russia',
 'Latvia',
 'United States',
 'Nigeria',
 'Mexico',
 'India',
 'Costa Rica',
 'Canada']

There is also an in-place repetition operation (__imul__)

country_list *= 3
country_list

['Russia',
 'Latvia',
 'United States',
 'Nigeria',
 'Mexico',
 'India',
 'Costa Rica',
 'Canada',
 'Russia',
 'Latvia',
 'United States',
 'Nigeria',
 'Mexico',
 'India',
 'Costa Rica',
 'Canada',
 'Russia',
 'Latvia',
 'United States',
 'Nigeria',
 'Mexico',
 'India',
 'Costa Rica',
 'Canada']

The point is that it makes for more efficient code. Also, when we append we are making a new object reference; An in-place extension retains the original object id.

x = [1,2,3]
print(id(x))

140193773473096

x1 = x + [4]
print(id(x1))

140193773472904

x += [4]
print(id(x))

140193773473096

print(x1)
print(x)

[1, 2, 3, 4]
[1, 2, 3, 4]

Slicing¶

Often we want values ranges of values in a container. We can accomplish this by slicing.

Rule of thumb:

:
<start here>:<to the value before here>

x = [1, 2, 3, 4, 5, 6]
x[1:4]

is

0  1  2  3  4  5
[1, 2, 3, 4, 5, 6]
    ^  ^  ^

country_list = ["Russia","Latvia","United States","Nigeria","Mexico","India","Costa Rica"]
country_list[1:5]

['Latvia', 'United States', 'Nigeria', 'Mexico']

When we leave a value open, we are saying take me all the way to the end or the beginning,

country_list[:4]

['Russia', 'Latvia', 'United States', 'Nigeria']

country_list[5:]

['India', 'Costa Rica']

The slicing operator by itself copies the object

cc = country_list[:]
cc is country_list

False

And every slice creates a new object id

print(id(country_list))
print(id(country_list[:3]))
print(id(country_list[3:]))

4336964808
4336503368
4337294856

Deleting Values¶

del keyword
.remove() method

del country_list[1]
country_list

['Russia', 'United States', 'Nigeria', 'Mexico', 'India', 'Costa Rica']

country_list.remove("Nigeria")
country_list

['Russia', 'United States', 'Mexico', 'India', 'Costa Rica']

Popping elements out of a container¶

Elements can be used and removed simultaneously from a collection with .pop(). Useful when you have a set list that you want to perform similar features on.

country_list.pop()

'Costa Rica'

country_list

['Russia', 'United States', 'Mexico', 'India']

We can pop items out given index location

country_list.pop(2)

'Mexico'

country_list

['Russia', 'United States', 'India']

Counting Values¶

country_list = ["Russia","Latvia","United States","Russia","Mexico",
                "India","Papua New Guinea","Latvia","Russia"]
print(country_list.count("Russia"))
print(country_list.count("Latvia"))

3
2

Sorting Values¶

country_list.sort()
country_list

['India',
 'Latvia',
 'Latvia',
 'Mexico',
 'Papua New Guinea',
 'Russia',
 'Russia',
 'Russia',
 'United States']

country_list.reverse()
country_list

['United States',
 'Russia',
 'Russia',
 'Russia',
 'Papua New Guinea',
 'Mexico',
 'Latvia',
 'Latvia',
 'India']

There are some built-in sorting methods also.

sorted(country_list)

['India',
 'Latvia',
 'Latvia',
 'Mexico',
 'Papua New Guinea',
 'Russia',
 'Russia',
 'Russia',
 'United States']

# Can sort by some defined function
sorted(country_list,key=len,reverse=True)

['Papua New Guinea',
 'United States',
 'Russia',
 'Russia',
 'Russia',
 'Mexico',
 'Latvia',
 'Latvia',
 'India']

# Can sort by a function that we define (more on lambda functions next time)
sorted(country_list,key=lambda x: x[0] == "R" or x[0] == "L",reverse=True)

['Russia',
 'Russia',
 'Russia',
 'Latvia',
 'Latvia',
 'United States',
 'Papua New Guinea',
 'Mexico',
 'India']

Accessing a method's documentation with `help()`¶

help([].sort)

Help on built-in function sort:

sort(*, key=None, reverse=False) method of builtins.list instance
    Stable sort *IN PLACE*.

Recall, also, that there is Jupyter notebook magic for requesting a function/methods documentation.

?list()

`list` Methods to Keep in Mind¶

Methods in object type `list`

Method	Description
`.append()`	L.append(object) -> None -- append object to end
`.clear()`	L.clear() -> None -- remove all items from L
`.copy()`	L.copy() -> list -- a shallow copy of L
`.count()`	L.count(value) -> integer -- return number of occurrences of value
`.extend()`	L.extend(iterable) -> None -- extend list by appending elements from the iterable
`.index()`	L.index(value, [start, [stop]]) -> integer -- return first index of value. Raises ValueError if the value is not present.
`.insert()`	L.insert(index, object) -- insert object before index
`.pop()`	L.pop([index]) -> item -- remove and return item at index (default last). Raises IndexError if list is empty or index is out of range.
`.remove()`	L.remove(value) -> None -- remove first occurrence of value. Raises ValueError if the value is not present.
`.reverse()`	L.reverse() -- reverse IN PLACE
`.sort()`	L.sort(key=None, reverse=False) -> None -- stable sort IN PLACE

Dictionaries¶

Recall that dictionary are associative array of key-value pairs, indexed by the keys. Dictionary maintain inherent ordering of the keys and the keys can't change once created but the values stored within the keys can change. Dictionary keys provide an efficient way to lookup information contained within the data structure.

We can combine dictionary with other data types (such as a list) to make an efficient and effective data structure.

grades = {"John": [90,88,95,86],"Susan":[87,91,92,89],"Chad":[56,None,72,77]}

We can use the keys for efficient look up.

grades["John"]

[90, 88, 95, 86]

We can also use the .get() method to get the values that correspond to a specific key.

grades.get("Susan")

[87, 91, 92, 89]

Accessing key-value pairs¶

To print a listing of all available keys, use the .keys() method

grades.keys()

dict_keys(['John', 'Susan', 'Chad'])

Likewise, we can print all values using the .values() method.

grades.values()

dict_values([[90, 88, 95, 86], [87, 91, 92, 89], [56, None, 72, 77]])

Finally, we can collect all key value pairs (as a tuple) using the .items() method.

grades.items()

dict_items([('John', [90, 88, 95, 86]), ('Susan', [87, 91, 92, 89]), ('Chad', [56, None, 72, 77])])

Updating dictionaries¶

We can add new dictionary data entries using the .update() method.

new_entry = {"Wendy":[99,98,97,94]} # Another student dictionary entry with grades
grades.update(nbew_entry) # Update the current dictionary 
grades

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-82-13108c066e21> in <module>()
      1 new_entry = {"Wendy":[99,98,97,94]} # Another student dictionary entry with grades
----> 2 grades.update(nbew_entry) # Update the current dictionary
      3 grades

NameError: name 'nbew_entry' is not defined

In a similar fashion, we can update the dictionary directly by providing a new key entry and storing the data.

grades["Seth"] = [66,72,79,81]
grades

{'John': [90, 88, 95, 86],
 'Susan': [87, 91, 92, 89],
 'Chad': [56, None, 72, 77],
 'Seth': [66, 72, 79, 81]}

Remember: values are mutable, keys are not

Dropping Keys¶

(1) You can `.pop()` a dictionary value out.¶

grades.pop("Seth")

[66, 72, 79, 81]

grades

{'John': [90, 88, 95, 86],
 'Susan': [87, 91, 92, 89],
 'Chad': [56, None, 72, 77]}

(2) You can `del`ete the key.¶

del grades['Wendy']

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-86-29e948693ad6> in <module>()
----> 1 del grades['Wendy']

KeyError: 'Wendy'

grades

{'John': [90, 88, 95, 86],
 'Susan': [87, 91, 92, 89],
 'Chad': [56, None, 72, 77]}

Dropping Values¶

To drop values, either

overwrite the original data
drop the key
clear the dictionary

grades['John'] = 7

grades

{'John': 7, 'Susan': [87, 91, 92, 89], 'Chad': [56, None, 72, 77]}

Clear the contents of the dictionary.

grades.clear()
grades

{}

Values don't have to be relational¶

Note the below:

for key "a", we stored an integer.
for key "b", we stored another dictionary that has two keys "i" and "ii" that stored a string and a float, respectively.
for key "c", we stored a tuple.

new_dict = {"a":6,"b":{"i":"hello","ii":2.3},"c":(4,5,6,7)}
new_dict

{'a': 6, 'b': {'i': 'hello', 'ii': 2.3}, 'c': (4, 5, 6, 7)}

`dict` methods to keep in mind¶

Methods in object type `dict`

Method	Description
`.clear()`	D.clear() -> None. Remove all items from D.
`.copy()`	D.copy() -> a shallow copy of D
`.fromkeys()`	Returns a new dict with keys from iterable and values equal to value.
`.get()`	D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.
`.items()`	D.items() -> a set-like object providing a view on D's items
`.keys()`	D.keys() -> a set-like object providing a view on D's keys
`.pop()`	D.pop(k[,d]) -> v, remove specified key and return the corresponding value. If key is not found, d is returned if given, otherwise KeyError is raised
`.popitem()`	D.popitem() -> (k, v), remove and return some (key, value) pair as a 2-tuple; but raise KeyError if D is empty.
`.setdefault()`	D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D
`.update()`	D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
`.values()`	D.values() -> an object providing a view on D's values

Sets¶

Sets differ from lists and dictionaries in that we can perform set operations. In addition, no duplicate values are retained in the set, so it provides an efficient way to isolate unique values in a list of inputs.

my_set = {1,2,3,8,4,4,6}
my_set

{1, 2, 3, 4, 6, 8}

Note that values in a set cannot be accessed using an index.

my_set[0]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-93-158c424478a1> in <module>()
----> 1 my_set[0]

TypeError: 'set' object does not support indexing

Rather we either .pop() values out of the set (but we cannot provide an index location).

my_set.pop()

1

my_set

{2, 3, 4, 6, 8}

Or we can .remove() specific values from the set.

my_set.remove(3)
my_set

{2, 4, 6, 8}

Finally, note that sets can contain heterogeneous scalar types, but they cannot contain other mutable container data types.

set_a = {.5,6,"a",None}
set_a

{0.5, 6, None, 'a'}

Can't hold a mutable list.

set_b = {.5,6,"a",None,[8,5,6]}

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-98-7feb046d8695> in <module>()
----> 1 set_b = {.5,6,"a",None,[8,5,6]}

TypeError: unhashable type: 'list'

Can hold an immutable tuple.

set_c = {.5,6,"a",None,(1,2,3)}
set_c

{(1, 2, 3), 0.5, 6, None, 'a'}

Finally, note that the order changed. Like dictionary keys, sets do not retain any intrinsic ordering.

`set` methods to keep in mind¶

Methods in object type `set`

Method	Description
`.add()`	Add an element to a set.
`.clear()`	Remove all elements from this set.
`.copy()`	Return a shallow copy of a set.
`.difference()`	Return the difference of two or more sets as a new set.
`.difference_update()`	Remove all elements of another set from this set.
`.discard()`	Remove an element from a set if it is a member.
`.intersection()`	Return the intersection of two sets as a new set.
`.intersection_update()`	Update a set with the intersection of itself and another.
`.isdisjoint()`	Return True if two sets have a null intersection.
`.issubset()`	Report whether another set contains this set.
`.issuperset()`	Report whether this set contains another set.
`.pop()`	Remove and return an arbitrary set element. Raises KeyError if the set is empty.
`.remove()`	Remove an element from a set; it must be a member.
`.symmetric_difference()`	Return the symmetric difference of two sets as a new set.
`.symmetric_difference_update()`	Update a set with the symmetric difference of itself and another.
`.union()`	Return the union of sets as a new set.
`.update()`	Update a set with the union of itself and others.