In the Asynchronous Lecture
In the Synchronous Lecture
If you have any questions while watching the pre-recorded material, be sure to write them down and to bring them up during the synchronous portion of the lecture.
The following tabs contain pre-recorded lecture materials for class this week. Please review these materials prior to the synchronous lecture.
Total time: Approx. 1 hour
Python is an object-oriented programming language (OOP) where the object plays a more fundamental role for how we structure a program. Specifically, OOP allows one to bundle properties and behavior into individual objects. In Python, objects can hold both the data and the methods used to manipulate that data.
=
is the assignment operator in Python. When using it, a reference is assigned to an object (e.g. below, x
references the object 4
in the statement x = 4
). There can be multiple references to the same object (more on this later).
= 4 x
An objects type is defined at runtime (also known as “duck typing”). Python is a dynamically typed language, which differs from other languages where type must be made explicit (e.g. C++, Java). Type cannot be changed once an object is created (coercing an object into a different type actually creates a new object).
type(x)
## <class 'int'>
Objects are assigned a unique object id when initiated in python.
id(x)
## 4570331536
An objects class provides a blueprint for object behavior and functionality. We use the pointer .
to access an objects methods.
x.
|
V
__add__() # method dictating behavior to the `+` operator
__mult__() # method dictating behavior to the `*` operator
__mod__() # method dictating behavior to the `%` operator
__eq__() # method dictating behavior to the `==` operator
.
.
.
Object’s class is instantiated upon assignment. For example, below I instantiate a collection object containing 4 integer values. x
is now an object of class ‘set’, and set
classes have different properties and methods than other class types, such as dict
ionaries, tuple
s, and/or list
s.
= set([1,2,3,4]) x
Here we can print out all the different methods using the dir()
function (which provides an internal directory of all the methods contained within the class). As we can see, there is a lot going on inside this single set
object!
dir(x)
## ['__and__', '__class__', '__class_getitem__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__iand__', '__init__', '__init_subclass__', '__ior__', '__isub__', '__iter__', '__ixor__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__or__', '__rand__', '__reduce__', '__reduce_ex__', '__repr__', '__ror__', '__rsub__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__xor__', 'add', 'clear', 'copy', 'difference', 'difference_update', 'discard', 'intersection', 'intersection_update', 'isdisjoint', 'issubset', 'issuperset', 'pop', 'remove', 'symmetric_difference', 'symmetric_difference_update', 'union', 'update']
There are two ways of instantiating a data class in Python:
[]
list()
Python comes with a number of built-in data types. When talking about data types, it’s useful to differentiate between scalar types (data types that hold one piece of information, like a digit) and collection types (data types that hold multiple pieces of information). These built-in data types are the building blocks for more complex data types, like a pandas DataFrame (which we’ll cover later).
Type | Description | Example | Literal | Constructor |
---|---|---|---|---|
int |
integer types | 4 |
x = 4 |
int(4) |
float |
64-bit floating point numbers | 4.567 |
x = 4.567 |
float(4) |
bool |
boolean logical values | True |
x = True |
bool(0) |
None |
null object (serves as a valuable place holder) | None |
x = None |
Note two things from the above table:
Here we assign an integer (3
) to the object x
.
= 3
x x
## 3
type(x)
## <class 'int'>
Now let’s coerce the integer to a float using the constructor float()
.
float(x)
## 3.0
Note that behavior of the object being coerced depends both on the initial class and the output class. Below we can see that when coercing 3
into an integer class, the value becomes True
. This is because all non-zero values are treated at True
in a boolean context. The output depends on the behavior of the bool()
class.
bool(3)
## True
Not every object can be coerced. In fact, every class object contains instructions on regarding what constructors it can play along with. Note when we look at all the methods in the int
object instanitiated in x
, we see a __bool__
method. This method provides instructions on how to convert the int
class into a bool
class.
= 3
x dir(x)
## ['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'as_integer_ratio', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']
x.__bool__()
## True
NOTE: the double underscores (
__
) are known as “dunder” in Python. Dunder methods describe how the object interfaces with other methods in the Python environment, such as with constructors or operators like addition.
Finally, all scalar data types are immutable, meaning they can’t be changed after assignment. When we make changes to a data type, say by coercing it to be another type as we do above, we’re actually creating a new object. We can see this by looking at the object id.
id()
tells us the “identity” of an object. That shouldn’t mean anything to you. Just know that when an object id is the same, it’s referencing the same data in the computer. We’ll explore the implications of this when we look at copying.
= 4
x id(x)
## 4570331536
Here we coerce x
to be a float
and then look up its id()
. As we can see, there is a new number associated with it. This means x
is a different object after coercion.
id(float(x))
## 4709619056
Type | Description | Example | Mutable | Literal | Constructor |
---|---|---|---|---|---|
list |
heterogeneous sequences of objects | [1,"2",True] |
✓ | x = ["c","a","t"] |
x = list("cat") |
str |
sequences of characters | "A word" |
✘ | x = "12345" |
x = str(12345) |
tuples |
heterogeneous sequence of objects | (1,2) |
✘ | x = (1,2) |
x = tuple([1,2]) |
sets |
unordered collection of distinct objects | {1,2} |
✓ | x = {1,2} |
x = set([1,2]) |
dicts |
associative array of key/value mappings | {"a": 1} |
keys ✘ values ✓ |
x = {'a':1} |
x = dict(a = 1) |
Each built-in collection data type in Python is distinct in important ways. Recall that an object’s class defines how the object behaves with operators and its methods. I’ll explore some of the differences in behavior for each class type so we can see what this means in practice
Note the column referring to Mutable and Immutable collection types. Simply put, mutable objects can be changed after it is created, immutable objects cannot be changed. All the scalar data types are immutable. Even when we coerced objects into a different class, we aren’t changing the existing object, we are creating a new one.
Some collection types, however, allow us to edit the data values contained within without needing to create a new object. This can allow us to effectively use the computer’s memory. It can also create some problems down the line if we aren’t careful (see the tab on copies).
In practice, mutability means we can alter values in the collection on the fly.
= ["sarah","susan","ralph","eddie"]
my_list id(my_list)
## 4711019712
1] = "josh"
my_list[ my_list
## ['sarah', 'josh', 'ralph', 'eddie']
id(my_list) # Still the same object, even though we changed something in it
## 4711019712
Immutability, on the other hand, means that we cannot alter values after the object is created. Python will throw an error at us if we try.
=("sarah","susan","ralph","eddie")
my_tuple 1] = "josh" my_tuple[
TypeError: 'tuple' object does not support item assignment
list
Lists allow for heterogeneous membership in the various object types. This means one can hold many different data types (even other collection types!). In a list, one can change items contained within the object after creating the instance.
= [1, 2.2, "str", True, None]
x x
## [1, 2.2, 'str', True, None]
A list constructor takes in an iterable object as input. (We’ll delve more into what makes an object iterable when covering loops, but the key is that the object must have an .__iter__()
method.)
list("This")
## ['T', 'h', 'i', 's']
At it’s core, a list is a bucket for collecting different types of information. This makes it useful for collecting data items when one needs to store them. For example, we can store multiple container types in a list.
= (1,2,3,4) # Tuple
a = {"a":1,"b":2} # Dictionary
b = [1,2,3,4] # List
c
= [a,b,c] # Combine these different container objects into a single list
together together
## [(1, 2, 3, 4), {'a': 1, 'b': 2}, [1, 2, 3, 4]]
A list
class has a range of specific methods geared toward querying, counting, sorting, and adding/removing elements in the container. For a list of all the list
methods, see here.
Let’s explore some of the common methods used.
= ["Russia","Latvia","United States","Nigeria","Mexico","India","Costa Rica"] country_list
Inserting values
Option 1: use the .append()
method.
"Germany")
country_list.append( country_list
## ['Russia', 'Latvia', 'United States', 'Nigeria', 'Mexico', 'India', 'Costa Rica', 'Germany']
Option 2: use the +
(add) operator.
= country_list + ['Canada']
country_list country_list
## ['Russia', 'Latvia', 'United States', 'Nigeria', 'Mexico', 'India', 'Costa Rica', 'Germany', 'Canada']
Addition means “append”?: Recall that an objects class dictates how it behaves in place of different operators. A
list
object has a.__add__()
method built into it that provides instructions for what the object should do when it encounters+
operator. Likewise, when it encounters a*
multiplication operator and so on. This is why it’s so important to know the class that you’re using. Different object classes == different behavior.
Deleting values
Option 1: use the del
operator + index.
# Drop Latvia
del country_list[1]
country_list
## ['Russia', 'United States', 'Nigeria', 'Mexico', 'India', 'Costa Rica', 'Germany', 'Canada']
Option 2: use the .remove()
method
"Nigeria")
country_list.remove( country_list
## ['Russia', 'United States', 'Mexico', 'India', 'Costa Rica', 'Germany', 'Canada']
Sorting values
country_list.sort() country_list
## ['Canada', 'Costa Rica', 'Germany', 'India', 'Mexico', 'Russia', 'United States']
str
Strings are containers too. String elements can be accessed using an index, much like objects in a list (See the tab on indices and keys).
= "This is a string"
s 4] s[:
## 'This'
The literal for a string is quotations: ''
or ""
. When layering quotations, one needs to opt for the quotation type different than the one used to instantiate the string object.
= 'This is a "string"'
s print(s)
## This is a "string"
= "This is a 'string'"
s print(s)
## This is a 'string'
A Multiline string can be created using three sets of quotations. This is useful when writing documentation for a function.
= '''
s2 This is a long string!
With many lines
Many. Lines.
'''
print(s2)
##
## This is a long string!
##
## With many lines
##
## Many. Lines.
String are quite versatile in Python! In fact, many of the manipulations that we like to perform on strings, such as splitting text up (also known as “tokenizing”), cleaning out punctuation and characters we don’t care for, and changing the case (to name a few) are built into the string class method.
For example, say we wanted to convert a string to upper case.
= "the professor is here!"
str1 str1.upper()
## 'THE PROFESSOR IS HERE!'
Or remove words.
"professor","student") str1.replace(
## 'the student is here!'
This is just a taste. The best way to learn what we can do with a string is to use it. We’ll deal with strings all the time when dealing with public policy data. So keep in mind that the str
data type is a powerful tool in Python. For a list of all the str
methods, see here.
tuple
Like a list
, a tuple
allows for heterogeneous membership among the various scalar data types. However, unlike a list
, a tuple
is immutable, meaning you cannot change the object after creating it.
The literal for a tuple
is the parentheses ()
= (1,"a",1.2,True)
my_tuple my_tuple
## (1, 'a', 1.2, True)
The constructor is tuple()
. Like the list
constructor, tuple()
an iterable object (like a list
) as an input.
= tuple([1,"a",1.2,True])
my_tuple my_tuple
## (1, 'a', 1.2, True)
Tuples are valuable if you want a data value to be fixed, such as if it were an index on a data frame, denoting a unit of analysis, or key on a dictionary. Tuples pop up all the time in the wild when dealing with more complex data modules, like Pandas. So we’ll see them again and again.
One nice thing that tuples allow for is unpacking. Unpacking allows one to deconstruct the tuple
object into named references (i.e. assign the values in the tuple
to their own objects). This allows for flexibility regarding which objects we want when performing sequential operations, like iterating.
= ("A","B","C")
my_tuple
# Here we're unpacking the three values into their own objects
= my_tuple
obj1, obj2, obj3
# Now let's print each object
print(obj1)
## A
print(obj2)
## B
print(obj3)
## C
Also, like a list
, a tuple
can store different collection data types as well as the scalar types. For example, we can store multiple container types in a tuple
.
= (1,2,3,4) # Tuple
a = {"a":1,"b":2} # Dictionary
b = [1,2,3,4] # List
c
= (a,b,c) # Combine these different container objects into a single tuple
together together
## ((1, 2, 3, 4), {'a': 1, 'b': 2}, [1, 2, 3, 4])
As we’ve seen, the way tuple
s (and other collection data types) behave when using operators such as addition and multiplication differ from the classic numerical operations that we’re used to. This is because the collection type have a special .__add__
and .__mult__
methods the outline how the data type should behave when these operations are in play.
Let’s see what this looks like in practice. Adding two or more tuples combines them.
1,2,3) + ("A","B") (
## (1, 2, 3, 'A', 'B')
Multiplying a tuple
repeats the tuple.
1,2,3) * 3 (
## (1, 2, 3, 1, 2, 3, 1, 2, 3)
A tuple
class has a more limited range of methods (two in fact!) geared toward counting and locating elements in the container. The reason a tuple
has so few methods when compared to a list
is because we can’t edit values in a tuple (i.e. it’s immutable) so all those methods built toward that end don’t cut it here. For a list of all the tuple
methods, see here.
set
A set
is an unordered collection of unique elements (this just means there can be no duplicates). set
is a mutable data type (elements can be added and removed). Moreover, the set
methods allow for set algebra. This will come in handy if we want to know something about unique values and membership.
The literal for set
is the brackets {}
.1
= {1,2,3,3,3,4,4,4,5,1}
my_set my_set
## {1, 2, 3, 4, 5}
The constructor is set()
. As before, it takes an iterable object as an input.
= set([1,2,4,4,5])
new_set1 new_set1
## {1, 2, 4, 5}
= set("Georgetown")
new_set2 new_set2
## {'g', 'n', 'o', 'w', 'G', 't', 'r', 'e'}
In the above, we can see that order isn’t a thing for a set
.
We can add elemets to a set
using the .add()
or .update()
methods.
6)
my_set.add( my_set
## {1, 2, 3, 4, 5, 6}
8})
my_set.update({ my_set
## {1, 2, 3, 4, 5, 6, 8}
Where a set
really shines is with the set operations. Say we had a set of country names.
= {"nigeria","russia","united states","canada"} countries
And we wanted to see which countries from our set were in another set (say another data set). Not a problem for a set!
= {"nigeria","netherlands","united kingdom","canada"} other_data
Which countries are in both sets?
countries.intersection(other_data)
## {'nigeria', 'canada'}
Which countries are in our data but not in the other data?
countries.difference(other_data)
## {'russia', 'united states'}
Note that values in a set cannot be accessed using an index.
1] my_set[
TypeError: 'set' object does not support indexing
Detailed traceback:
File "<string>", line 1, in <module>
Rather we either .pop()
values out of the set.
my_set.pop()
## 1
Or we can .remove()
specific values from the set.
3)
my_set.remove( my_set
## {2, 4, 5, 6, 8}
Finally, note that sets can contain heterogeneous scalar types, but they cannot contain other mutable container data types.
= {.5,6,"a",None}
set_a set_a
## {0.5, None, 'a', 6}
In set_b
, the list
object is mutable.
= {.5,6,"a",None,[8,5,6]} set_b
TypeError: unhashable type: 'list'
All this is barely scratching the surface of what we can do with sets. For a list of all the set
methods, see here.
dict
A dictionary is the true star of the Python data types. dict
is an associative array of key-value pairs. That means, we have some data (value) that we can quickly reference by calling its name (key). As we’ll see next week, this allows for a very efficient way to look data values, especially when the dictionary is quite large.
There is no intrinsic ordering to the keys, and keys can’t be changed once created (that is, the keys are immutable), but the values can be changed (assuming that the data type occupying the value spot is mutable, like a list
). Finally, keys cannot be duplicated. Recall we’re going to use the keys to look up data values, so if those keys were the same, it would defeat purpose!
The literal for a dict
is {:}
as in {<key>:<value>}
.
= {'a': 4, 'b': 7, 'c': 9.2}
my_dict my_dict
## {'a': 4, 'b': 7, 'c': 9.2}
The constructor is dict()
. Note the special way we can designate the key value pairing when using the constructor.
= dict(a = 4.23, b = 10, c = 6.6)
my_dict my_dict
## {'a': 4.23, 'b': 10, 'c': 6.6}
The dict
class has a number of methods geared toward listing the information contained within. To access the dict
’s keys, use the .keys()
method.
my_dict.keys()
## dict_keys(['a', 'b', 'c'])
Just want the values? Use .values()
my_dict.values()
## dict_values([4.23, 10, 6.6])
Want both? Use .items()
. Note how the data comes back to us — as tuple
s nested in a list
! This just goes to show you how intertwined the different data types are in Python.
my_dict.items()
## dict_items([('a', 4.23), ('b', 10), ('c', 6.6)])
We can combine dictionary with other data types (such as a list) to make an efficient and effective data structure.
= {"John": [90,88,95,86],"Susan":[87,91,92,89],"Chad":[56,None,72,77]} grades
We can use the keys for efficient look up.
"John"] grades[
## [90, 88, 95, 86]
We can also use the .get()
method to get the values that correspond to a specific key.
"Susan") grades.get(
## [87, 91, 92, 89]
Updating Dictionaries
We can add new dictionary data entries using the .update()
method.
= {"Wendy":[99,98,97,94]} # Another student dictionary entry with grades
new_entry # Update the current dictionary
grades.update(new_entry) grades
## {'John': [90, 88, 95, 86], 'Susan': [87, 91, 92, 89], 'Chad': [56, None, 72, 77], 'Wendy': [99, 98, 97, 94]}
In a similar fashion, we can update the dictionary directly by providing a new key entry and storing the data.
"Seth"] = [66,72,79,81]
grades[ grades
## {'John': [90, 88, 95, 86], 'Susan': [87, 91, 92, 89], 'Chad': [56, None, 72, 77], 'Wendy': [99, 98, 97, 94], 'Seth': [66, 72, 79, 81]}
One can also drop keys by .pop()
ing the key value pair out of the collection…
"Seth") grades.pop(
## [66, 72, 79, 81]
…or deleting the key using the del
operator.
del grades['Wendy']
grades
## {'John': [90, 88, 95, 86], 'Susan': [87, 91, 92, 89], 'Chad': [56, None, 72, 77]}
Likewise, one can drop values by:
# Example of using .clear()
grades.clear() grades
## {}
This is barely scratching the surface. For a list of all the dict
methods and all the things you can do with a dictionary, see here.
0-based index
standard index: 0, 1 , 2 , 3 , 4
[1, 2.2, "str", True, None]
reverse index: -5, -4 , -3 , -2 , -1
# %% Indices -----------------------------------------
# Define a list
= [1, 2.2, "str", True, None]
x
x
# can see how many values are in our container with len()
len(x)
# Can look up individual data values by referencing its location
3]
x[
# Python throws an error if we reference an index location that doesn't exist
7]
x[
# We use a negative index to count BACKWARDS in our collection data type.
-3]
x[
# %% Slicing -----------------------------------------
# We use the : operator to slice (i.e. select ranges of values)
# Slicing in a nutshell <start-here>:<go-until-right-before-here>
# To pull out values in position 1 and 2
1:3]
x[
# When we leave left or right side blank, Python implicitly goes to the beginning or end
3]
x[:2:]
x[
# %% Keys -----------------------------------------
# Define a dictionary
= {"John":[90,88,95,86],"Susan":[87,91,92,89],"Chad":[56,None,72,77]}
grades
# Unlike lists/tuples/sets, we use a key to look up a value in a dictionary
"John"]
grades[
# We can then index in the data structure housed in that key's value position
# as is appropriate for that data object
"John"][1] grades[
# Copies with mutable objects -----------------------
# Create a list object
= ["a","b","c","d"]
x
# Dual assignment: when objects reference the same data.
= y
x print(id(x))
print(id(y))
# If we make a change in one
1] = "goat"
y[
# That change is reflected in the other
print(x)
# Because these aren't independent objects
# We can get around this issue by making **copies**
= x.copy() # Here y is a copy of x.
y # This duplicates the data in memory, so that y and x are independent.
# Three ways to make a copy:
# (1) Use copy method
= x.copy()
y # (2) Use constructor
= list(x)
y # (3) Slice it
= x[:]
y
# Copies with nested objects -----------------------
= [[1,2,3],[4,7,88],[69,21,9.1]]
nested_list
# Create a shallow copy
= nested_list.copy()
new_list
# This copy only works for the "first layer" in the nested data structure.
0][1] = 1000
new_list[print(nested_list)
# Creating a deep copy
import copy
= copy.deepcopy(nested_list) new_list
The following survey asks you quick questions regarding the usefulness of the asynchronous lecture materials. Feedback will be used to modify aspect of the asynchronous materials moving forward.
These exercises are designed to help you reinforce your grasp of the concepts covered in the asynchronous lecture material.
Let’s looks look at a list
scalar data type. If you run the directory dir()
on an list object, you’ll note the following functions: .append()
, .clear()
,.copy()
, .count()
, .extend()
, .index()
, .insert()
, .pop()
, .remove()
,.reverse()
, and .sort()
.
Let’s exlpore these functions using the following list
object, by answering the below questions.
= ["Benny","Juice","Hewy","Samantha"] friends
= ["Benny","Juice","Hewy","Samantha"]
friends
# (1) Add "Ralph" to the list.
# There are a few ways we could do this.
"Ralph") # Append
friends.append(= friends + ["Ralph"] # Addition == Append
friend "Ralph"])
friend.extend([
# Notice that when using the methods, the operation occurs in place (i.e. we
# don't need to write over the object). But now I have two more Ralphs than I intended. Let's drop those with remove and pop
# Pop out the last item in the list
friend.pop() "Ralph")
friend.remove(
# Note when we pop that value can be assigned to an object.
# (2) Look up the index position for "Hewy".
"Hewy")
friend.index(
# (3) Pop and/or remove "Juice" from the list.
"Juice")
friend.remove(
# Or
# loc_of_juice = friend.index("Juice")
# friend.pop(loc_of_juice)
# (4) Sort the list in alphabetical order.
friend.sort()
Use the following dictionary containing student grades to answer the questions below.
= {"John": [90,88,95,86],"Susan":[87,91,92,89],"Chad":[56,None,72,77]} grades
list
object called keys
transfer_file
= {"John": [90,88,95,86],"Susan":[87,91,92,89],"Chad":[56,None,72,77]}
grades
# (1) Look up the keys in the dictionary, and story them in a `list` object
# called `keys`
= list(grades.keys())
keys
# (2) 2. Your students just finished another assignment. John received an 83,
# Susan a 92, and Chad an 81. Please add these grades to the dictionary.
"John"].append(83)
grades["Susan"].append(92)
grades["Chad"].append(81)
grades[
# The key to the above is to remember that once we reference the values in the dictionary, we are then dealing with a list (since that is what is stored in the dictionary value position). So we can use the list append method to add values to this list.
# (3) Chad is transferring schools. Please remove Chad from the grades
# dictionary and store his data in an object call `transfer_file`
= grades.pop("Chad")
transfer_file
# .pop() method allows us to "pop out" a value from the list. After we do so,
# neither the key nor value remain.
Make a copy of the grades
object from Q2 and store it in a new object called new
. Now change John’s grade on the second assignment from an 88 to a 90 in the new
object. Make sure that the grades
object wasn’t also changed when you changed new
.
# We'll need to make a deep copy, as dictionaries are inherently nested
import copy
= copy.deepcopy(grades)
new "John"][1] = 90 # Change the grade
new[
# Compare the object values to make sure they differ.
print(new)
print(grades)
Note that this is very similar to the literal for a dict
ionary but in that data structure we define a key/value pair (see the dict
tab)↩︎
The following materials were generated for students enrolled in PPOL564. Please do not distribute without permission.
ed769@georgetown.edu | www.ericdunford.com