PPOL564 | DS1 | Foundations

Writing Classes

Classes

Classes are the way of defining the structure and behavior of an object at the time when we create the object. An object's class controls its initialization and which attributes are available through the object. Classes make complex problems tractable but class can make simple solutions overly complex as well. Thus, we need to strike a balance when using classes.

We can initialize a class using the class keyword, which is a built in that allows us to define the class object. The convention is to use "camel-case" when naming classes in python. Class is a statement that binds the class level code to the class name.

In [1]:
class DataWrangler:
    pass 

We initialize a class by calling constructor (which we just created).

In [2]:
DW = DataWrangler()
type(DW) # It is of type 'DataWrangler'
Out[2]:
__main__.DataWrangler

What is going on here?

Recall when we write a function using the def keyword, we are binding the code contained in that function's code chunk to the specified name.

def this_name(x)
    return x**2

binds the code x**2 to the name this_name.

Likewise, we can can do this with larger, more complex chunks of code using classes.

class MyClass:

    def func_1():

    def func_2():

    def func_n():

Classes offer a way of housing whole systems of code to an object. In essence, it offers us a way to create our own object types with their own methods (internal functions) and attributes (dunder). Put simply, a class is a logical grouping of data and functions.

In [3]:
class DataWrangler:
    
    def __init__(self):
        self

    def say_hello(self):
        print('Hello!')
In [4]:
DW = DataWrangler()
DW.say_hello()
Hello!

A class is a "blueprint" for what we'd like our object to look, but we don't "create" the object when we read in the class code. Rather we do so when we create an instance of the class, i.e. use our class constructor DataWrangeler() and assign it to some object, DW.

For a nice post on classes, see Jeff Knupp's post on the topic.

class features

  • self → the instance of the class.
  • init → allows us to bind object to the instance when initializing the object.
  • instance method → a function defined within a class. This is a function that takes self as an argument.
  • method → a function can be called on objects that exist outside the class. This is a function that does not take self as an argument.

self

When we initialize our class object, we create an instance of it. self offers us a way of referencing that instance.

In [5]:
class DataWrangler:

    def say_hello(self,word):
        print(word)
        
DW = DataWrangler()
DW.say_hello(word="Cat")
Cat

This is equivalent to passing the following...

In [6]:
DataWrangler.say_hello(DW,word="Cat")
Cat

__init__

We can store data within it and pass information to the various functionality within the class that is bound to the instance upon initialization. That is, when we first create the object, we can store initial information that we can then share internally with the other methods in our class. The __init__ attribute is known as the "initializer".

In [7]:
class DataWrangler:

    def __init__(self,word=''):
        self.word = word
    
    def say_hello(self):
        print(self.word)
        
DW = DataWrangler(word="Cat")
DW.say_hello()
Cat

We can access the data (variables) we assign to the instances and overwrite them just as we would any other object.

In [8]:
DW.word
Out[8]:
'Cat'
In [9]:
DW.word = "Dog"
DW.say_hello()
Dog

instance method

An instance method is a function that requires an instances of the class in order to run. Put differently, it requires self to be an argument in the function. This gives the function to all the data and values contained within a specific instance, which can be a convenient and powerful way to pass around information.

The say_hello() function is an example of such a method.

method

A method is a function that does not require the instance to run. For example, see the function add(), note that we do not initialize the object and then use the function.

In [10]:
class DataWrangler:

    def __init__(self,word=''):
        self.word = word
    
    def say_hello(self):
        print(self.word)
        
    def add(x,y):
        return x + y
    
DataWrangler.add(2,3)
Out[10]:
5

class objects

In [11]:
class DataWrangler:
    
    new_word = "!"

    def __init__(self,word=''):
        self.word = word
    
    def say_hello(self):
        print(self.word + DataWrangler.new_word)
        
    def add(x,y):
        return x + y
    
DW = DataWrangler(word="hello")
DW.say_hello()
hello!
In [12]:
class DataWrangler:
    
    container = []

    def __init__(self,word=''):
        self.word = word
    
    def say_hello(self):
        print(self.word)
        
    def load1(self,x):
        return DataWrangler.container.append(x)
    
    def load2(self,x):
        return DataWrangler.container.append(x)
    
DW = DataWrangler(word="hello")
DW.container
DW.load1(1)
DW.container
DW.load2(2)
DW.container
Out[12]:
[1, 2]

attributes

We can define how our class should behave to python's other functionality.

In [13]:
class DataWrangler:

    def __init__(self,word=''):
        self.word = word
    
    def say_hello(self):
        print(self.word)
        
    def __eq__(self, other):
        print(f"Is {self.word} == {other}?")
        if self.word == other:
              print("Yes")
        else: 
              print("No :(")

DW = DataWrangler(word="hello")
DW == 1
DW == "hello"
Is hello == 1?
No :(
Is hello == hello?
Yes
In [14]:
class DataWrangler:

    def __init__(self,word=''):
        self.word = word
    
    def say_hello(self):
        print(self.word)
        
    def __iter__(self):
        return iter(self.word)

DW = DataWrangler(word="hello")
for i in DW:
    print(i)
h
e
l
l
o

Things to keep in mind

  • classes should have docstrings, just as functions do, explaining their functionality.
  • classes are the central to object oriented programming.
  • all types in python have a class (we've talked about this extensively)
  • there are other (more advanced) class features that we won't spend any time on here. Specifically, static methods, attributes, decorators, and class inheritance.

Class Example

In [15]:
# Here is some example data that we'll read into our class method.
from pandas import DataFrame
(DataFrame(dict(var1 = [1,2,3,4],
               var2 = [.4,.55,6.6,1.7],
               var3 = ["a","b","c","d"],
               var4 = [4,55,100,-3]))
 .to_csv("example_data.csv"))

Here let's write out the entire infrastructure of our class using pass as a placeholder for the code we'll eventually write.

In [16]:
import csv

class DataWrangler:
    '''
    Class that wrangles data
    '''

    def __init__(self,data_path=''):
        pass
    
    def load_data(self):
        '''
        Read in data given some provided file path
        '''
        pass
            
    def columns(self):
        '''
        Print off all available columns
        '''
        pass
    
    def select(self,variable):
        '''
        Select a variable
        '''
        pass
    
    def display(self):
        '''
        Display the data frame
        '''
        pass

Let's fill in each function piece by piece.

In [17]:
import csv


class DataWrangler:
    '''
    Class that wrangles data
    '''

    def __init__(self,data_path=''):
        self.data_path = data_path
       
        
    def load_data(self):
        '''
        Read in data given some provided file path
        '''
        with open(self.data_path) as file:
            dat = list(csv.reader(file))
            # convert data types
            for row in dat:
                for ind,val in enumerate(row):
                    if '.' in val:
                        row[ind] = float(val)
                    elif val.isdigit():
                        row[ind] = int(val)
                    else:
                        val
            self.data=dat
  
                
    def columns(self):
        '''
        Print off all available columns
        '''
        pass
    
    def select(self,variable):
        '''
        Select a variable
        '''
        pass
    
    def display(self):
        '''
        Display the data frame
        '''
        pass 
    
    
DW = DataWrangler(data_path='example_data.csv')
DW.load_data()
DW.data
Out[17]:
[['', 'var1', 'var2', 'var3', 'var4'],
 [0, 1, 0.4, 'a', 4],
 [1, 2, 0.55, 'b', 55],
 [2, 3, 6.6, 'c', 100],
 [3, 4, 1.7, 'd', '-3']]
In [18]:
import csv
import pprint

class DataWrangler:

    def __init__(self,data_path=''):
        self.data_path = data_path
       
        
    def load_data(self):
        '''
        Read in data given some provided file path
        '''
        with open(self.data_path) as file:
            dat = list(csv.reader(file))
            # convert data types
            for row in dat:
                for ind,val in enumerate(row):
                    if '.' in val:
                        row[ind] = float(val)
                    elif val.isdigit():
                        row[ind] = int(val)
                    else:
                        val
            self.data=dat
            print('File does not exist.')
                
    def columns(self):
        '''
        Print off all available columns
        '''
        return self.data[0]
    
    
    def select(self,variable):
        '''
        Select a variable
        '''
        pass
    
    
    def display(self):
        '''
        Display the data frame
        '''
        pass 
    
    
DW = DataWrangler(data_path='example_data.csv')
DW.load_data()
DW.columns()
File does not exist.
Out[18]:
['', 'var1', 'var2', 'var3', 'var4']
In [19]:
import csv

class DataWrangler:
    '''
    Class that wrangles data
    '''

    def __init__(self,data_path=''):
        self.data_path = data_path
       
        
    def load_data(self):
        '''
        Read in data given some provided file path
        '''
        with open(self.data_path) as file:
            dat = list(csv.reader(file))
            # convert data types
            for row in dat:
                for ind,val in enumerate(row):
                    if '.' in val:
                        row[ind] = float(val)
                    elif val.isdigit():
                        row[ind] = int(val)
                    else:
                        val
            self.data=dat
            print('File does not exist.')
                
    def columns(self):
        '''
        Print off all available columns
        '''
        return self.data[0]
    
    
    def select(self,variable):
        '''
        Select a variable
        '''
        
        columns = self.columns()
        output = []
        if variable in columns:
            position = columns.index(variable)
            for row in self.data[1:]:
                output.append(row[position])
        else:
            print(f'{variable} is not in the data. Please choose another variable')
        return output

    
    def display(self):
        '''
        Display the data frame
        '''
        pass 
    
    
DW = DataWrangler(data_path='example_data.csv')
DW.load_data()
DW.select(variable="var3")
File does not exist.
Out[19]:
['a', 'b', 'c', 'd']
In [20]:
import csv

class DataWrangler:
    '''
    Class that wrangles data
    '''

    def __init__(self,data_path=''):
        self.data_path = data_path
       
        
    def load_data(self):
        '''
        Read in data given some provided file path
        '''
        with open(self.data_path) as file:
            dat = list(csv.reader(file))
            # convert data types
            for row in dat:
                for ind,val in enumerate(row):
                    if '.' in val:
                        row[ind] = float(val)
                    elif val.isdigit():
                        row[ind] = int(val)
                    else:
                        val
            self.data=dat
            print('File does not exist.')
                
    def columns(self):
        '''
        Print off all available columns
        '''
        return self.data[0]
    
    
    def select(self,variable):
        '''
        Select a variable
        '''
        
        columns = self.columns()
        output = []
        if variable in columns:
            position = columns.index(variable)
            for row in self.data[1:]:
                output.append(row[position])
        else:
            print(f'{variable} is not in the data. Please choose another variable')
        return output

    def display(self):
        '''
        Display the data frame
        '''
        print(self.data)
    
    
DW = DataWrangler(data_path='example_data.csv')
DW.load_data()
DW.display()
File does not exist.
[['', 'var1', 'var2', 'var3', 'var4'], [0, 1, 0.4, 'a', 4], [1, 2, 0.55, 'b', 55], [2, 3, 6.6, 'c', 100], [3, 4, 1.7, 'd', '-3']]

We now have a customized object that takes in some .csv as an argument, and allows us to manipulate the data in a contained setting.

In [21]:
DW = DataWrangler(data_path='example_data.csv')
DW.load_data()
print(DW.columns())
print(DW.select(variable='var4'))
File does not exist.
['', 'var1', 'var2', 'var3', 'var4']
[4, 55, 100, '-3']