PPOL564 | Data Science 1: Foundations

Lecture 19

Optimizing in One Dimension

Concepts Covered:

  • Extrema and Critical points
  • Local vs. Global Extrema
  • Higher order derivatives
  • Convex, Concave, and inflection points
  • Optimizing Univariate functions
  • Using sympy to walk through the optimization steps
In [1]:
import numpy as np
from sympy import * 
init_printing(use_unicode=True)

# Bokeh for interactive plots
from bokeh.plotting import figure, output_notebook, show
from bokeh.layouts import row, column
from bokeh.models import Span, PointDrawTool
output_notebook()

def plot(w=700,h=500,title='',x='X',y='f(x)'):
    '''Wrapper function to ease starting a new plot.'''
    p = figure(plot_width=w,plot_height=h,title=title,
           toolbar_location="above",
           tooltips=[("out", "$y"),("x", "$x")],
           tools='pan,hover,wheel_zoom,reset')
    
    # Define a continuous line for the x-axis
    hline = Span(location=0, dimension='width', line_color='grey', line_width=1)
    p.renderers.extend([hline])
    
    # Labels
    p.xaxis.axis_label = x
    p.yaxis.axis_label = y
    return p

def draw(p):
    '''Wrapper function for generating a drawing tool on a graph'''
    renderer = p.scatter(x=[], y=[], color='black', size=10)
    draw_tool = PointDrawTool(renderers=[renderer], empty_value='black')
    p.add_tools(draw_tool)
    p.toolbar.active_tap = draw_tool
    return p
Loading BokehJS ...

Extrema

Consider the following two functions:

$$ f(x) = -x^2 $$ $$ g(x) = x^2 $$

In [52]:
def f(x):
    return -x**2

def g(x):
    return x**2

Let's plot them. For what value of $x$ in the interval $[-5,5]$ does each function reach a maximum or minimum value?

In [53]:
x = np.arange(-5,5.1,.1) # Define the Function Domain 

# Plot 1
p = plot(w=700,h=300)
p.line(x,f(x),line_width=3,alpha=.3)
# p.scatter([0],[0],size=6)
p = draw(p)

# Plot 2
p2 = plot(w=700,h=300)
p2.line(x,g(x),line_width=3,alpha=.3)
# p2.scatter([0],[0],size=6)
p2 = draw(p2)

show(column(p,p2)) # Render plot

What about the following function?

$$h(x) = -2x \cos{x} $$

In [4]:
def h(x):
    return -2*x*np.cos(x)

Let's plot them again. This time let's try to "eyeball" where the minima and maxima of the function for $x \in [-5,5]$

In [6]:
x = np.arange(-5,5.1,.1) # Define the Function Domain 

# Plot Maxima
p = plot(w=700,h=400,title="Where are the maxima and minima? Draw them.")
p.line(x,h(x),line_width=3,alpha=.5,color="orange")

# Generate a drawing tool to locate the extrema
p = draw(p)

show(p)# Render plot

Why do we care about these extreme points on a function?

Critical Points

Something interesting emerges when we look at the first derivative of the function $h(x)$.

$$h(x) = -2x \cos{x} $$

$$h'(x) = -2 \cos(x) + 2x\sin(x) $$

In [7]:
def dh(x):
    return -2*np.cos(x) + 2*x*np.sin(x)
In [9]:
p = plot(w=700,h=400,title="h(x) and h'(x) for x in [-5,5]")

# Plot function
p.line(x,h(x),line_width=3,alpha=.5,color="orange",legend="h(x)")
p.line(x,dh(x),line_width=3,alpha=.4,color="steelblue",legend="h'(x)")
p = draw(p)
    
show(p) # render function

A critical point is any $x^* \in [a,b]$ such that either $f'(x^*) = 0$ or $f'(x^*)$ is undefined. These are the points on the function where "things happen": the function either blows up, jumps (discontinuity), or is stationary.

Stationary points occur when $f'(x^*) = 0$: at these points in the function, there is no change in the function. We are interested in critical points because local extrema occur at these points.

Extrema of a function correspond with its maximum and minimum values within some specified domain. This can be thought of as "the peaks and the valleys" of the function. The first derivative of a function tells use the slope of the tangent line. As we see above, when function reaches a maximum/minimum, slope of the tangent line goes to zero. The first derivative carries information about the potential location of the maxima and minima within a specified domain: the extrema will always be a point where the first derivative crosses the x-axis.

In [12]:
x = np.arange(-5,5.1,.1) # Define the Function Domain 

# Define the first derivatives of f and g
df = lambda x: -2*x
dg = lambda x: 2*x

# Plot Maxima
p = plot(w=700,h=400,title="Maxima of f(x)")
p.line(x,f(x),line_width=3,alpha=.3,legend="f(x)")
p.line(x,df(x),line_width=3,alpha=.3,color="green",legend="f'(x)")
p.scatter([0],[0],size=6)

# Plot Minima
p2 = plot(w=700,h=400,title="Minima of g(x)")
p2.line(x,g(x),line_width=3,alpha=.3,legend="g(x)")
p2.line(x,dg(x),line_width=3,alpha=.3,color="green",legend="g'(x)")
p2.scatter([0],[0],size=6)

show(column(p,p2)) # Render plot

Graphically, we can clearly locate these points visually (at least for simple functions like $f(x)$ and $g(x)$), but how would we determine if a critical point is a maxima or minima without visualizing it.

Look at the above graph and see if we can't gather any intuition for how this might work?

Consider the function $w(x)$



$$ w(x) = 2x^2 + x + 1$$



What are the critical values? That is, when is the derivative (a) equal to one or (b) undefined.



$$ w'(x) = 4x + 1 $$



$$ w'(x^*) = 4x + 1 = 0 $$



$$ x^* = -.25$$



Does this point correspond with a maxima or minima? How would we figure this out?



What is the functions behavior around that point? From the above figure we might surmise from the above graphic is that what we're most interested in is the behavior of the function both just prior and just after the critical point. Specifically, the value of the slope.



  • A maxima appears to correspond with $+ \to -$ sign flip.
  • A minima appears to correspond with $- \to +$ sign flip.



With our hypotheses in hand, let's plug in values around the critical point and see their sign.

position evaluation sign
Left $ w'(-.5) = 4(-.5) + 1 = -1 $ $-$ decreasing
Critical value $ w'(-.25) = 4(-.25) + 1 = 0 $ $~$ stationary
Right $ w'(0) = 4(0) + 1 = 1 $ $+$ increasing

$x^* = -.25$ appears to correspond with a minima.

Let's inspect this result visually

In [13]:
w = lambda x: 2*x**2 + x + 1
p = plot(h=300,w=700)
p.line(x,w(x),line_width=3,alpha=.4)
p.scatter([-.25],w(-.25),size=7,color="black")
show(p)

As we'll see below, we can get more precise about this definition by using higher order derivatives.

Local vs. Global Extremum

A local extremum is the largest/smallest value of a function over some interval of values in the domain of the function (i.e. over some interval on the x-axis). A global extremem is the largest/smallest point on all values of the function

Consider again $h(x)$, which is plotted above. Of it's extrema, which are global and which are local?

In [14]:
p = plot(w=700,h=300,title="h(x) for x in [-5,5]")
p.line(x,h(x),line_width=3,alpha=.5,color="orange",legend="h(x)")
p = draw(p)
show(p)

What about for function $t(x)$ and $v(x)$?

$$ t(x) = (x-1)(x-3)(x-4) $$

In [15]:
def t(x):
    return (x-1)*(x-3)*(x-4)

def v(x):
    return x
In [17]:
p = plot(w=700,h=400)
x = np.arange(1,4.1,.1) # domain
p.line(x,t(x),line_width=3,alpha=1,color="black",legend="t(x) for x in [1,4]")
x = np.arange(0,6.1,.1) # domain
p.line(x,t(x),line_width=3,alpha=.2,color="green",legend="t(x) for x in [0,6]")

p2 = plot(w=700,h=400)
x = np.arange(0,1.1,.1) # domain
p2.line(x,v(x),line_width=3,alpha=.5,color="steelblue",legend="v(x) for x in [0,1]")

p = draw(p);p2 = draw(p2)
p.legend.location = "top_left";p2.legend.location = "top_left"
show(column(p,p2))

Takeaways?

  • Distinguishing global from local extrema requires that we consider the bounds placed on the maximization/minimization problem.
  • Extrema at the boundary of the domain need not correspond to points at which the first derivative is zero.
  • Extrema that occur within the domain are called interior extrema.
  • supremum is the least upper bound of any set. infinimum is the greatest lower bound of any set.

Higher Order Derivatives

Recall the use of the sympy module (see lecture 19). Let's leverage the module to find the higher order derivatives of function $f(y)$

In [18]:
y = symbols('y')
f = y*(2*y**2 + .3*y**3)

print("\nFunction f(y)")
display(f)

print("\nFirst Derivative of f(y)")
display(f.diff())

print("\nSecond Derivative of f(y)")
display(f.diff().diff())

print("\nThird Derivative of f(y)")
display(f.diff().diff().diff())

print("\nFourth Derivative of f(y)")
display(f.diff().diff().diff().diff())

print("\nFifth Derivative of f(y)")
display(f.diff().diff().diff().diff().diff())
Function f(y)
$$y \left(0.3 y^{3} + 2 y^{2}\right)$$
First Derivative of f(y)
$$0.3 y^{3} + 2 y^{2} + y \left(0.9 y^{2} + 4 y\right)$$
Second Derivative of f(y)
$$1.8 y^{2} + y \left(1.8 y + 4\right) + 8 y$$
Third Derivative of f(y)
$$7.2 y + 12$$
Fourth Derivative of f(y)
$$7.2$$
Fifth Derivative of f(y)
$$0$$

Derivatives of any order tell us something about the shape of a function.

  • The first derivative informs which direction the function is trending. Is it increasing? Is it decreasing?
  • The second derivative tells us the most basic curvature of the function. Is the rate at which it is increasing or decreasing getting faster or slower?
  • Higher order derivatives provide more nuanced information about the shape of the function.
  • For locating extrema, we care most about the information provided by the first and second derivatives

Intuition of higher order derivatives

What do higher-order derivatives represent conceptually? Think of driving a car. The function $f(\cdot)$ describes how far we drove as a function of time. $$~\\Distance = f(time)\\~$$ The derivative tells us our speed as a function of time (i.e. how fast were we going). $$ ~\\Speed = f'(time)\\~$$ The second derivative tells us our rate of acceleration as a function of time (i.e. how hard we were pressing on the petal at any given point in the trip). $$ ~\\Acceleration = f''(time)\\~$$ The third derivative would tell us how much the car "jerked" at any moment in time (i.e. how quickly we pressed or let up on the petal). $$ ~\\Jerk = f'''(time)\\~$$ And so on...

In [23]:
# Functions
f = lambda time: -10*np.cos(time) + 10
df = lambda time: 10*np.sin(time)
ddf = lambda time: 10*np.cos(time)

# Domain
time = np.arange(0,(np.pi),.01)

# Plot
dim = 300,700
p = plot(h=dim[0],w=dim[1],y="Distance",x="Time")
p1 = plot(h=dim[0],w=dim[1],y="Speed",x="Time")
p2 = plot(h=dim[0],w=dim[1],y="Acceleration",x="Time")
p.line(time,f(time),line_width=2.5)
p1.line(time,df(time),line_width=2.5,color="darkred")
p2.line(time,ddf(time),line_width=2.5,color="gold")
show(column(p,p1,p2))

Notation

For any given function $f(x)$ that is continuous and differentiable within the domain $x \in [a,b]$



$$ \text{first-order derivative}: \frac{df(x)}{dx} = \lim_{h\to0}\frac{f(x-h)-f(x)}{h} = f'(x) = D_x$$



$$ \text{second-order derivative}: \frac{d^2f(x)}{dx^2} = \lim_{h\to0}\frac{f'(x-h)-f'(x)}{h} = f''(x) = D_x^2$$



$$ \vdots $$



$$ \text{nth-order derivative}: \frac{d^{n}f(x)}{dx^n} = \lim_{h\to0}\frac{f^{(n)}(x-h)-f^{(n)}(x)}{h} = f^{(n)}(x) = D_x^n$$



Visualization

In [24]:
def graph(func,x,start=-5,end=5):
    '''
    Generate an interactive graph of a sympy defined function along with 
    it's first and second derivatives
    
    Arguments:
    
        - func: univariate function defined with a sympy instanitiated symbol.
        - start: start of the numpy numerical array
        - end: end of the numpy numerical array
        
    Return:
    
        - Renders an interactive graph.
    '''
    
    # Establish numerical range
    num_range = np.arange(start, end + .1,.1)
    
    # convert sympy and higher order derivatives into numpy output
    f = lambdify(x, func, "numpy") 
    df = lambdify(x, func.diff(), "numpy")
    ddf = lambdify(x, func.diff().diff(), "numpy")

    
    # Establish Plot
    p = figure(width=700,height=400,
               toolbar_location="above",
               tooltips=[("f_", "$y"),("x", "$x")],
               tools='pan,hover,wheel_zoom,reset')
    
    # Set Axis spans
    vline = Span(location=0, dimension='height', line_color='black', line_width=.5)
    hline = Span(location=0, dimension='width', line_color='black', line_width=.5)
    p.renderers.extend([hline,vline])
    
    # Plot function and first/second derivatives
    p.line(num_range,f(num_range),
           color="black",muted_color='black',muted_alpha=0.1,
           line_width=4,alpha=.6,
           legend=f"f({x.name}) = {func}")
    p.line(num_range,df(num_range),
           color="steelblue",muted_color='steelblue',muted_alpha=0.1,
           line_width=4,alpha=.6,
           legend=f"f'({x.name}) = {func.diff()}")
    p.line(num_range,ddf(num_range),
           color="orange",muted_color='orange',muted_alpha=0.1,
           line_width=4,alpha=.6,
           legend=f"f''({x.name}) = {func.diff().diff()}")
    
    # Add draw feature
    p = draw(p)
    
    # Set plot conditions
    p.legend.click_policy="mute"
    p.legend.location = "top_left"
    p.xaxis.axis_label = "X"
    p.yaxis.axis_label = ""
    show(p) # Output plot

Play with different functional forms and get a feel for their higher order derivatives.

In [25]:
z = symbols('z') # define symbol z
In [26]:
graph(-1+z*cos(z),z)
In [27]:
graph(z**3,z)

Concave, Convex, and Inflection Points

In [28]:
x = symbols('x') # Define sympy symbol

Concavity (maxima)

$$ f'(x^*) = 0 $$ $$ f''(x^*) < 0 $$

In [29]:
graph(-x**2,x)

Convexity (minima)

$$ f'(x^*) = 0 $$ $$ f''(x^*) > 0 $$

In [30]:
graph(x**2,x)

Inflection Points

$$ f'(x^*) = 0 $$ $$ f''(x^*) = 0 $$

In [31]:
graph(x**3,x)

Inflection points are locations where the graph turns from being concave to convex or vice versa. Such points are also known as saddle points and are not extrema. In general, we don't care about inflection points: we care about extrema!

<font color = 'red'> However, we have to make sure that inflection points are actually inflection points </font>. That is, the function goes from concave to convex or vice versa. It could actually be a minima/maxima. We can uncover if this is the case by taking the higher order derivatives.

$$ f'''(x^*) > 0~~\text{or}~~f'''(x^*) < 0~~\text{then}~~x^* =~\text{inflection point} $$

For function $f(x) = x^3$



$$ f'(x^*) = 0~\text{for}~x^* = 0$$



$$ f''(0) = 0$$



$$ f'''(x) = 6$$



In [32]:
(x**3).diff().diff().diff() 
Out[32]:
$$6$$

So $x^* =0$ for $f(x) = x^3$ is a inflection point.



However, consider the following function...

In [33]:
func = x**4
graph(func,x)



$$ f'(x^*) = 4x^3 = 0 $$ $$ x^3 = 0 $$ $$ x^* = 0 $$



$$ f''(0) = 12(0)^2 = 0 $$



$$ f'''(0) = 24(0) = 0 $$



$$ f^{(4)}(0) = 24 $$



$x^* = 0$ is actually a minima of $f(x)$.



General Rule



  • if $f''(x^*) = 0$, don't trust it right away:
    • Compute the higher order derivative at the stationary point $x^*$ until you arrive at a non-zero value.
      • If this derivative is of odd order ($f'''(x^*)$,$f^{(5)}(x^*)$,$f^{(7)}(x^*)$,$\dots$), then the stationary point is an inflection point.
      • If this derivative is of even order ($f^{(4)}(x^*)$,$f^{(6)}(x^*)$,$\dots$), then the stationary point is an extremum.
        • if this value is positive then the stationary point is a local minimum.
        • if this value is negative then the stationary point is a local maximum.



Comparing Boundaries and Extrema

Once you've located the critical points of a function, found the stationary points and determined if they are local minima or maxima, you need to make a determination if they are global minima or maxima. That is, across the entire specified domain of the function, is this in fact the largest (or smallest) point.

We can do this by actually plugging the boundary values into $f(x)$ and seeing if their value is larger/smaller than the interior extrema we located.

Again with function $f(x) = x^4$, we located a local minima at $x^* = 0$, is this in fact a local minima?

From the graph above, we can see the domain $[-5,5]$. Let's plug those values into $f(x)$.



$$ f(-5) = (-5)^4 = 625 $$



$$ f(5) = (5)^4 = 625 $$



Neither value is less than our minima

$$ f(0) = 0 $$

So $x^*$ is in fact a global minima for $f(x)$

Steps for Minimizing/Maximizing a Univariate Function

Putting all the above together, here are the steps for minimizing or maximizing a univariate function (i.e. finding the global minima/maxima). These are the steps outlined in the reading (see pg. 168):

  1. Find $f'(x)$
  2. Set $f'(x) = 0 $ and solve for all $x^*$. These are stationary points of the function.
  3. Find $f''(x)$
  4. For each stationary point $x^*$, substitute $x^*$ into $f''(x)$
    • if $f''(x^*) < 0$, $f(x)$ has a local maximum at $x^*$.
    • if $f''(x^*) > 0$, $f(x)$ has a local minimum at $x^*$.
    • if $f''(x^*) = 0$, $x^*$ may be an inflection point. But we need to check to make sure:
      • Compute the higher order derivative at the stationary point $x^*$ until you arrive at a non-zero value.
        • If this derivative is of odd order ($f'''(x^*)$,$f^{(5)}(x^*)$,$f^{(7)}(x^*)$,$\dots$), then the stationary point is an inflection point.
        • If this derivative is of even order ($f^{(4)}(x^*)$,$f^{(6)}(x^*)$,$\dots$), then the stationary point is an extremum.
          • if this value is positive then the stationary point is a local minimum.
          • if this value is negative then the stationary point is a local maximum.
  5. Substitue each local extremum into $f(x)$ to find the functions's value at that point.
  6. Substitute the lower and upper bounds of the domain over which you are attempting to find the extrema into $f(x)$ to find the function's values at those points.
  7. Of those values:
    • the smallest $x^*$ or boundary point is the global minimum.
    • the largest $x^*$ or boundary point is the global maximum.

And with that, choose the value you care about the most (i.e. are you maximizing or minimizing?)

Shortcuts with sympy

$$ f(x) = 0.3 x^{3} - x^{2} + 0.4 x + 2 $$

$$ \arg max_x f(x) $$

where $x \in [-4,4] $

In [36]:
# Our function!
x = symbols('x')
f = .3*x**3 -x**2 + .4*x +2
f
Out[36]:
$$0.3 x^{3} - x^{2} + 0.4 x + 2$$

Step 1

In [37]:
# Find the first derivative
df = f.diff()
df
Out[37]:
$$0.9 x^{2} - 2 x + 0.4$$

Step 2

In [38]:
# Solve for f'(x) = 0
Eq(df,0)
Out[38]:
$$0.9 x^{2} - 2 x + 0.4 = 0$$
In [39]:
crit_points = solveset(Eq(df,0),domain=Interval(-4,4))
crit_points = list(crit_points)
crit_points
Out[39]:
$$\left [ 0.222222222222222, \quad 2.0\right ]$$

Step 3

In [40]:
# Find the second derivative
ddf = df.diff()
ddf
Out[40]:
$$1.8 x - 2$$

Step 4

In [41]:
# For x* in crit_points, plug each x* in to f''(x)
ddf.evalf(subs={x: 0}) # We can plut in values for x like this
Out[41]:
$$-2.0$$

Let's now plug in our critical points.

In [43]:
store = []
for x_ in crit_points:
    deriv2_eval = ddf.evalf(subs={x: x_})
    
    # Determine type
    cp_type = None
    if deriv2_eval > 0:
        cp_type ="local minima"
    elif deriv2_eval < 0:
        cp_type ="local maxima"
    else:
        cp_type ="pot. inflection point"
    
    # Print each type
    print(f"f''({round(x_,2)}) = {round(deriv2_eval,2)} (x* = {round(x_,2)} is a {cp_type})")
    
    # save output
    store.append([x_,deriv2_eval])
    
store = np.array(store) # convert to numpy matrix
f''(0.22) = -1.6 (x* = 0.22 is a local maxima)
f''(2.0) = 1.6 (x* = 2.0 is a local minima)

Step 5

In [44]:
# Find the values for each interior extremum
print("Interior Extrema:")
for row in store:
    crit_val = row[0]
    ev = f.evalf(subs={x: crit_val}) # plug back into x
    print(f'f({round(crit_val,2)}) = {round(ev,2)}')
Interior Extrema:
f(0.22) = 2.04
f(2.0) = 1.2

Step 6

In [45]:
# Find the values of the boundaries
boundaries = [-4,4]
print('Boundary Values:')
for b in boundaries:
    ev = f.evalf(subs={x: b})
    print(f'f({round(b,2)}) = {round(ev,2)}')
Boundary Values:
f(-4) = -34.8
f(4) = 6.8

Are either of our stationary points global minima or maxima?

Inspect Visually

In [46]:
graph(.3*x**3 -x**2 + .4*x +2 ,x,start=-4,end=4)

You Try it!

Find all extrema for the following functions along the specified domain. State whether each extremum is a minimum or maximum. Of the minima and maxima, state which are local and which are global.

  1. $$ f(x) = 6x - x^2 + 12, x\in [0,10]$$
  1. $$ f(x) = -4 + 3x^3, x\in [0,10]$$
  1. $$ f(x) = ln(1+x) - x^2, x \in [-2,2]$$

Scipy's optimize

Python has a range of optimization algorithms, some of which we will touch in the forthcoming lectures. For now, I just want to put scipy on your radar.

In [47]:
from scipy import optimize
In [48]:
# Set our objective function
def f(x):
    return .3*x**3 -x**2 + .4*x +2
In [49]:
result = optimize.minimize_scalar(f,bounds=(-4,4),method='bounded')
In [50]:
result
Out[50]:
     fun: -34.79991872352222
 message: 'Solution found.'
    nfev: 30
  status: 0
 success: True
       x: -3.999996435239639