PPOL564 | DS1: Foundations

Lecture 22

Constained Optimization with Equality Constraints
(Supplement)

The Lagrange Multiplier: What is it?¶

Note that the gradient vectors are the point of tangency are oriented along the same direction.

$$ \nabla f(x,y) = \lambda \nabla g(x,y) $$

The Lagrange multiplier is the proportion of the two gradients. Put differently, the gradients are proportionally equal by some scaling factor $\lambda$, which is the lagrange multiplier.

Let's unpack this further.

Recall $f(x,y)$ is our objective function and $g(x,y) = c$ is our equality constraint, where $c$ is some constant.

$$\mathcal{L}(x,y,\lambda) = f(x,y) - \lambda (g(x,y) - c)$$

$$ \nabla \mathcal{L}(x^*,y^*,\lambda^*) = \textbf{0} $$

Note that when the constraint is met, $\lambda^* (g(x^*,y^*) - c) = 0 $

$$\mathcal{L}(x^*,y^*,\lambda^*) = f(x^*,y^*) - 0 $$

$$ P^* = f(x^*,y^*)$$

Where $P^*$ is the specific condition where a stationary point exists and the constraint is met.

Let's now think of $c$ as a variable. When we do, the stationary points that satisfy the first order condition will change. Thus, our stationary points become a function of whatever we set $c$ to be.

$$ P^*(c) = f(x^*(c),y^*(c)) $$

The lagrange multiplier is the derivative of the function with respect to the constraint!

$$ \frac{dP^*}{dc} = \lambda^* $$

Wait, what?¶

$$\mathcal{L}(x,y,\lambda) = f(x,y) - \lambda (g(x,y) - c)$$

$$\frac{d\mathcal{L}(x,y,\lambda)}{dc} = \lambda$$

But recall, the points that matter most (the stationary points... where our extrema live) are all a function of $c$

$$\frac{d\mathcal{L}(x^*(c),y^*(c),\lambda^*(c),c)}{dc} = f(x^*(c),y^*(c)) - \lambda^*(c) (g(x^*(c),y^*(c)) - c)$$

This function is now effectively a vector-valued function! To solve calculate the derivative, we need to use the multivariate chain rule.

#### Sidenote: Multivariate Chain Rule $$ \textbf{s}(f_1(z),f_2(z)) = \begin{pmatrix} f_1(z) \\ f_2(z) \end{pmatrix}$$ What is the resulting change in $\textbf{s}( \cdot )$ when we nudge $z$ slightly?
$$ \frac{d\textbf{s}(f_1(z),f_2(z))}{dz} $$
$$ \lim_{h\to0}\frac{\textbf{s}(f_1(z-h),f_2(z)) - \textbf{s}(f_1(z),f_2(z))}{h} + \lim_{h\to0}\frac{\textbf{s}(f_1(z),f_2(z-h)) - \textbf{s}(f_1(z),f_2(z))}{h} $$
$$ \frac{\partial s}{\partial f_1 } \frac{\partial f_1}{\partial z} + \frac{\partial s}{\partial f_2 } \frac{\partial f_2}{\partial z} $$
Much like the univariate chain rule, we are "nudging through functions". What we care about is how a nudge in the input ultimately impacts the resulting output. This logic holds in multiple dimensions: we are nudging through functions in multiple dimensions. We nudge in each dimension independently (i.e. each unit vector) and then take the combination of all our nudges.

$$\frac{d\mathcal{L}(x^*(c),y^*(c),\lambda^*(c),c)}{dc} = \frac{d\mathcal{L}}{dx^*}\frac{dx^*}{dc} + \frac{d\mathcal{L}}{dy^*}\frac{dy^*}{dc} + \frac{d\mathcal{L}}{d\lambda^*}\frac{d\lambda^*}{dc} + \frac{d\mathcal{L}}{dc}\frac{dc}{dc}$$

Recall that the stationary points by definition only holds when the first derivative is zero, so we can do this...

$$ 0 + 0 + 0 + \frac{d\mathcal{L}}{dc}\frac{dc}{dc} $$

and $\frac{dc}{dc} = 1$ so

$$ \frac{d\mathcal{L}(x^*(c),y^*(c),\lambda^*(c),c)}{dc} =\frac{d\mathcal{L}}{dc}$$

And we know from above that

$$\frac{d\mathcal{L}}{dc} = \lambda $$

What does this tell us?

The Lagrange multiplier tells us the rate at which increases/decreases in the constraint parameter increases/decreases whatever we're trying to maximize/minimize. It is the constraint's marginal cost.

PPOL564 | DS1: Foundations

Lecture 22 Constained Optimization with Equality Constraints (Supplement)

The Lagrange Multiplier: What is it?¶

Wait, what?¶

Lecture 22

Constained Optimization with Equality Constraints
(Supplement)