Classes are the way of defining the structure and behavior of an object at the time when we create the object. An object's class controls its initialization and which attributes are available through the object. Classes make complex problems tractable but class can make simple solutions overly complex as well. Thus, we need to strike a balance when using classes.
We can initialize a class using the class
keyword, which is a built in that allows us to define the class object. The convention is to use "camel-case" when naming classes in python. Class is a statement that binds the class level code to the class name.
class DataWrangler:
pass
We initialize a class by calling constructor (which we just created).
DW = DataWrangler()
type(DW) # It is of type 'DataWrangler'
Recall when we write a function using the def
keyword, we are binding the code contained in that function's code chunk to the specified name.
def this_name(x)
return x**2
binds the code x**2
to the name this_name
.
Likewise, we can can do this with larger, more complex chunks of code using classes.
class MyClass:
def func_1():
def func_2():
def func_n():
Classes offer a way of housing whole systems of code to an object. In essence, it offers us a way to create our own object types with their own methods (internal functions) and attributes (dunder). Put simply, a class is a logical grouping of data and functions.
class DataWrangler:
def __init__(self):
self
def say_hello(self):
print('Hello!')
DW = DataWrangler()
DW.say_hello()
A class
is a "blueprint" for what we'd like our object to look, but we don't "create" the object when we read in the class code. Rather we do so when we create an instance of the class, i.e. use our class constructor DataWrangeler()
and assign it to some object, DW
.
For a nice post on classes, see Jeff Knupp's post on the topic.
class
features¶self
as an argument.self
as an argument.self
¶When we initialize our class object, we create an instance of it. self
offers us a way of referencing that instance.
class DataWrangler:
def say_hello(self,word):
print(word)
DW = DataWrangler()
DW.say_hello(word="Cat")
This is equivalent to passing the following...
DataWrangler.say_hello(DW,word="Cat")
__init__
¶We can store data within it and pass information to the various functionality within the class that is bound to the instance upon initialization. That is, when we first create the object, we can store initial information that we can then share internally with the other methods in our class. The __init__
attribute is known as the "initializer".
class DataWrangler:
def __init__(self,word=''):
self.word = word
def say_hello(self):
print(self.word)
DW = DataWrangler(word="Cat")
DW.say_hello()
We can access the data (variables) we assign to the instances and overwrite them just as we would any other object.
DW.word
DW.word = "Dog"
DW.say_hello()
An instance method is a function that requires an instances of the class in order to run. Put differently, it requires self
to be an argument in the function. This gives the function to all the data and values contained within a specific instance, which can be a convenient and powerful way to pass around information.
The say_hello()
function is an example of such a method.
A method is a function that does not require the instance to run. For example, see the function add()
, note that we do not initialize the object and then use the function.
class DataWrangler:
def __init__(self,word=''):
self.word = word
def say_hello(self):
print(self.word)
def add(x,y):
return x + y
DataWrangler.add(2,3)
class DataWrangler:
new_word = "!"
def __init__(self,word=''):
self.word = word
def say_hello(self):
print(self.word + DataWrangler.new_word)
def add(x,y):
return x + y
DW = DataWrangler(word="hello")
DW.say_hello()
class DataWrangler:
container = []
def __init__(self,word=''):
self.word = word
def say_hello(self):
print(self.word)
def load1(self,x):
return DataWrangler.container.append(x)
def load2(self,x):
return DataWrangler.container.append(x)
DW = DataWrangler(word="hello")
DW.container
DW.load1(1)
DW.container
DW.load2(2)
DW.container
We can define how our class should behave to python's other functionality.
class DataWrangler:
def __init__(self,word=''):
self.word = word
def say_hello(self):
print(self.word)
def __eq__(self, other):
print(f"Is {self.word} == {other}?")
if self.word == other:
print("Yes")
else:
print("No :(")
DW = DataWrangler(word="hello")
DW == 1
DW == "hello"
class DataWrangler:
def __init__(self,word=''):
self.word = word
def say_hello(self):
print(self.word)
def __iter__(self):
return iter(self.word)
DW = DataWrangler(word="hello")
for i in DW:
print(i)
# Here is some example data that we'll read into our class method.
from pandas import DataFrame
(DataFrame(dict(var1 = [1,2,3,4],
var2 = [.4,.55,6.6,1.7],
var3 = ["a","b","c","d"],
var4 = [4,55,100,-3]))
.to_csv("example_data.csv"))
Here let's write out the entire infrastructure of our class using pass
as a placeholder for the code we'll eventually write.
import csv
class DataWrangler:
'''
Class that wrangles data
'''
def __init__(self,data_path=''):
pass
def load_data(self):
'''
Read in data given some provided file path
'''
pass
def columns(self):
'''
Print off all available columns
'''
pass
def select(self,variable):
'''
Select a variable
'''
pass
def display(self):
'''
Display the data frame
'''
pass
Let's fill in each function piece by piece.
import csv
class DataWrangler:
'''
Class that wrangles data
'''
def __init__(self,data_path=''):
self.data_path = data_path
def load_data(self):
'''
Read in data given some provided file path
'''
with open(self.data_path) as file:
dat = list(csv.reader(file))
# convert data types
for row in dat:
for ind,val in enumerate(row):
if '.' in val:
row[ind] = float(val)
elif val.isdigit():
row[ind] = int(val)
else:
val
self.data=dat
def columns(self):
'''
Print off all available columns
'''
pass
def select(self,variable):
'''
Select a variable
'''
pass
def display(self):
'''
Display the data frame
'''
pass
DW = DataWrangler(data_path='example_data.csv')
DW.load_data()
DW.data
import csv
import pprint
class DataWrangler:
def __init__(self,data_path=''):
self.data_path = data_path
def load_data(self):
'''
Read in data given some provided file path
'''
with open(self.data_path) as file:
dat = list(csv.reader(file))
# convert data types
for row in dat:
for ind,val in enumerate(row):
if '.' in val:
row[ind] = float(val)
elif val.isdigit():
row[ind] = int(val)
else:
val
self.data=dat
print('File does not exist.')
def columns(self):
'''
Print off all available columns
'''
return self.data[0]
def select(self,variable):
'''
Select a variable
'''
pass
def display(self):
'''
Display the data frame
'''
pass
DW = DataWrangler(data_path='example_data.csv')
DW.load_data()
DW.columns()
import csv
class DataWrangler:
'''
Class that wrangles data
'''
def __init__(self,data_path=''):
self.data_path = data_path
def load_data(self):
'''
Read in data given some provided file path
'''
with open(self.data_path) as file:
dat = list(csv.reader(file))
# convert data types
for row in dat:
for ind,val in enumerate(row):
if '.' in val:
row[ind] = float(val)
elif val.isdigit():
row[ind] = int(val)
else:
val
self.data=dat
print('File does not exist.')
def columns(self):
'''
Print off all available columns
'''
return self.data[0]
def select(self,variable):
'''
Select a variable
'''
columns = self.columns()
output = []
if variable in columns:
position = columns.index(variable)
for row in self.data[1:]:
output.append(row[position])
else:
print(f'{variable} is not in the data. Please choose another variable')
return output
def display(self):
'''
Display the data frame
'''
pass
DW = DataWrangler(data_path='example_data.csv')
DW.load_data()
DW.select(variable="var3")
import csv
class DataWrangler:
'''
Class that wrangles data
'''
def __init__(self,data_path=''):
self.data_path = data_path
def load_data(self):
'''
Read in data given some provided file path
'''
with open(self.data_path) as file:
dat = list(csv.reader(file))
# convert data types
for row in dat:
for ind,val in enumerate(row):
if '.' in val:
row[ind] = float(val)
elif val.isdigit():
row[ind] = int(val)
else:
val
self.data=dat
print('File does not exist.')
def columns(self):
'''
Print off all available columns
'''
return self.data[0]
def select(self,variable):
'''
Select a variable
'''
columns = self.columns()
output = []
if variable in columns:
position = columns.index(variable)
for row in self.data[1:]:
output.append(row[position])
else:
print(f'{variable} is not in the data. Please choose another variable')
return output
def display(self):
'''
Display the data frame
'''
print(self.data)
DW = DataWrangler(data_path='example_data.csv')
DW.load_data()
DW.display()
We now have a customized object that takes in some .csv as an argument, and allows us to manipulate the data in a contained setting.
DW = DataWrangler(data_path='example_data.csv')
DW.load_data()
print(DW.columns())
print(DW.select(variable='var4'))