PPOL564 - Data Science I: Foundations

Lecture 5

Functions

Contents

  • Building functions
  • Scoping
  • Lambda Functions (advanced)
    • Note that we'll run into more immediate applications for lambda functions when using them in the Pandas library.

Function Basics

  • def: keyword for generating a function
    • def + some_name + () + : to set up a function header
  • Arguments: things we want to feed into the function and do something to.
  • return: keyword for returning a specific value from a function
In [1]:
def square(x):
    y = x*x
    return y
In [2]:
square(10)
Out[2]:
100

Docstrings

Docstrings are strings that occur as the first statement within a named function block.

def function_name(input):
    '''
    Your docstring goes here.
    '''
    |
    |
    | Function block
    |
    |
    return something

The goal of the docstring is to tell us what the function does. We can request a functions docstring using the help() function.

In [3]:
def paste(string_one,string_two):
    '''
    This is a useless function that pastes two strings together 
    '''
    return string_one + " "  + string_two

paste("public","policy")
Out[3]:
'public policy'
In [4]:
help(paste)
Help on function paste in module __main__:

paste(string_one, string_two)
    This is a useless function that pastes two strings together

Conventions of writing docstrings

PEP-257 says that "The docstring for a function or method should summarize its behavior and document its arguments, return value(s), side effects, exceptions raised, and restrictions." Google offers a more useful style guide on how to set up a docstring.

Generally-speaking, it should look something like this:

def function_name(x,y,z):
    '''Quick description of what the function does.

    A more detailed description, if need be. 

    Arguments:
        list of all the arguments and what they need to be 
        or what their default values are.
        x: 
        y: 
        x: 

    Returns:
        Short description regarding what the function returns

    Raises:
        All the types of errors that the function raises

        TypeError: 
        ValueError:
    '''
    |
    |
    | Function block
    |
    |
    return something

Arguments

Arguments are all the input values that lie inside the parentheses.

def fun(argument_1,argument_2):

We can supply default values to one or all arguments; in doing so, we've specified a default argument.

def fun(argument_1 = "default 1",argument_2 = "default 2"):

def fun(a,b=""):
  • argument a is called a positional argument (*arg). We provide value to it by matching the position in the sequence.
  • argument b is called a keyword argument (**kwargs). Because we give it a default value.

Keyword arguments must come after positional arguments, or python will throw a SyntaxError.

In [5]:
def my_func(a,b=''):
    return a + b
my_func("cat","dog")
Out[5]:
'catdog'
In [6]:
def my_func(a='',b):
    return a + b
my_func("cat","dog")
  File "<ipython-input-6-9ad0eee116c7>", line 1
    def my_func(a='',b):
               ^
SyntaxError: non-default argument follows default argument

Returning Multiple Arguments

In [7]:
def added_list(a,b,c,d):
    return [a,a+b,a+b+c,a+b+c+d]

added_list(1,2,3,4)
Out[7]:
[1, 3, 6, 10]
In [8]:
def added_tuple(a,b,c,d):
    return (a,a+b,a+b+c,a+b+c+d)

added_tuple(1,2,3,4)
Out[8]:
(1, 3, 6, 10)
In [9]:
def added_dict(a,b,c,d):
    return {"position 1": a,"position 2": a+b,
            "position 3": a+b+c, "position 4":a+b+c+d}

added_dict(1,2,3,4)
Out[9]:
{'position 1': 1, 'position 2': 3, 'position 3': 6, 'position 4': 10}

Never use mutable values as defaults

Let's visualize why this is the case.

In [10]:
def my_func(a = []):
    a.append('x')
    return a

my_func()
my_func()
my_func()
Out[10]:
['x', 'x', 'x']

To get around this, we only use immutable value as placeholders.

In [11]:
def my_func(a = None):
    if a is None:
        a = []
    a.append('x')
    return a

my_func()
my_func()
my_func()
Out[11]:
['x']

When to use a function?

Whenever you repeat a chunk of code or some process more than once, you should wrap it in a function. When writing functions we should think about two things:

  1. Am I repeating the same chunks of code in multiple locations?
  2. Can the function generalize to other types of data or problems?

Scope

Scopes are contexts in which named references can be looked up. Scopes are arranged in a hierarchy from which object references can be looked up

There are 4 scopes total (From narrowest to broadest):

  1. Local: name is defined inside the current function
  2. Enclosing: Any and all enclosing functions
  3. Global: Any and all names defined at the top level of a module.
  4. Built-in: names "built into" python through the builtins module

Note that for loops and code blocks do not introduce new nested scopes. We can alter the rules slightly when need by using the global and local calls

In [12]:
def tt():
    print(f'val = {count}')
count = 0
tt()
count = 5
tt()
val = 0
val = 5

What happened here?

  1. When referencing the count variable, python looked to see if the function was defined in the current function.
  2. Since it wasn't, python then looked outside this function to see if count existed, there it did find a count object and used it.

We can be explicit with regard to which variables we reference using the global keyword.

In [13]:
def tt():
    global count # call to the global scope to get the object reference
    print(f'val = {count}')    
tt()
val = 5
In [14]:
# This is a global variable
a = 0

if a == 0:
    # This is still a global variable
    b = 1

def my_function(c):
    # this is a local variable
    d = 3
    print(c)
    print(d)

# Now we call the function, passing the value 7 as the first and only parameter
my_function(7)
7
3

We may not refer to both a global variable and a local variable by the same name inside the same function.

In [15]:
a = 0
def my_function():
    print(a)
    a = 3
    print(a)

my_function()
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-15-ee9156875ccf> in <module>()
      5     print(a)
      6 
----> 7 my_function()

<ipython-input-15-ee9156875ccf> in my_function()
      1 a = 0
      2 def my_function():
----> 3     print(a)
      4     a = 3
      5     print(a)

UnboundLocalError: local variable 'a' referenced before assignment
In [16]:
def my_function():
    print(a)
    print(a)

my_function()
0
0

"Because we haven’t declared a to be global, the assignment in the second line of the function will create a local variable a. This means that we can’t refer to the global variable a elsewhere in the function, even before this line! The first print statement now refers to the local variable a – but this variable doesn’t have a value in the first line, because we haven’t assigned it yet!

Note that it is usually very bad practice to access global variables from inside functions, and even worse practice to modify them. This makes it difficult to arrange our program into logically encapsulated parts which do not affect each other in unexpected ways.

If a function needs to access some external value, we should pass the value into the function as a parameter. If the function is a method of an object, it is sometimes appropriate to make the value an attribute of the same object."

Refer to the python documentation for a more detailed outline of scoping conditions in python. Quote drawn from site.

Again, this hits on why we want to avoid using mutable objects for default values and want to be conscious of how we are referencing variables. The two examples below seek to get at this. In the first, my_list references a global object. Thus, we are altering that object as we use another_function(). However, when we define my_list locally, we can get around this issue. Moreover, notice in the final code chunk how the global my_list still stays the same.

In [17]:
my_list = []

def another_function(x):
    my_list.append(x)
    return my_list

print(another_function(4))
print(another_function(4))
print(another_function(4))
[4]
[4, 4]
[4, 4, 4]
In [18]:
def another_function(x):
    my_list = []
    my_list.append(x)
    return my_list

print(another_function(4))
print(another_function(4))
print(another_function(4))
[4]
[4]
[4]
In [19]:
my_list = ['a','b']
def another_function(x):
    my_list = []
    my_list.append(x)
    return my_list

print(another_function(4))
print(another_function(4))
print(another_function(4))
print(my_list)
[4]
[4]
[4]
['a', 'b']

Anonymous Functions (lambda)

Sometimes we need to perform a simple computation, but would rather not generate a function to do so.

Example:

  • Say we wanted to sort this list of US Presidents by the longest to shortest last name.
  • The sorted() function is a built in function that can do this for us. If we were to just run this on the list, it would sort in alphabetical order. sorted() has two keyword arguments: key= and reverse=. The key argument offers us a way to define our own sorting function.
In [20]:
presidents = ['Barak Obama','Donald Trump','George Bush','Jimmy Carter','Bill Clinton']
sorted(presidents)
Out[20]:
['Barak Obama', 'Bill Clinton', 'Donald Trump', 'George Bush', 'Jimmy Carter']
In [21]:
def sort_by_longest_last_name(x):
    return len(x.split()[-1])

# Sort by longest to shortest last name.
sorted(presidents,key=sort_by_longest_last_name,reverse=True)
Out[21]:
['Bill Clinton', 'Jimmy Carter', 'Barak Obama', 'Donald Trump', 'George Bush']

The lambda key word would allow us to build this function on the fly without needing to def a named function.

In [22]:
sorted(presidents,key=lambda x: len(x.split()[-1]),reverse=True)
Out[22]:
['Bill Clinton', 'Jimmy Carter', 'Barak Obama', 'Donald Trump', 'George Bush']

Setting up the lambda function

lambda <function arguments>: <expression>
In [23]:
square = lambda x: x**2
square(5)
Out[23]:
25

Using a lambda function

Lambda functions are most useful when used in concert with higher-order functions, which are functions that take other functions as input. (We'll see a lot of these when we use Pandas methods).

Differences between def and lambda

def lambda
function statement is defined as a code chunk that is then bound to a name function statement is a single expression
must have a name anonymous (no name)
arguments delimited in the parentheses arguments delimited before the colon
return keyword required to return value return value is given by the expression (i.e. whatever is the result at the end of the function (no return keyword used)
Allows for a docstring Does not allow for a docstring

The biggest difference lies in our ability to test the function. Since a def function allows us to tie a code statement to a named object, we can easily probe that object and run unit tests on its (i.e. tests to make sure the function is doing what we need it to). This isn't the case for a lambda function, so a useful rule of thumb is to keep your lambda functions simple so that they are easy to inspect to make sure they are doing what we need them to do. If a lambda expression gets too complicated, then it's a better idea to write an actual function for it.