PPOL564 | DS1: Foundations

Lecture 22

Constained Optimization with Equality Constraints
(Supplement)

The Lagrange Multiplier: What is it?

Note that the gradient vectors are the point of tangency are oriented along the same direction.



$$ \nabla f(x,y) = \lambda \nabla g(x,y) $$



The Lagrange multiplier is the proportion of the two gradients. Put differently, the gradients are proportionally equal by some scaling factor $\lambda$, which is the lagrange multiplier.

Let's unpack this further.

Recall $f(x,y)$ is our objective function and $g(x,y) = c$ is our equality constraint, where $c$ is some constant.



$$\mathcal{L}(x,y,\lambda) = f(x,y) - \lambda (g(x,y) - c)$$



$$ \nabla \mathcal{L}(x^*,y^*,\lambda^*) = \textbf{0} $$



Note that when the constraint is met, $\lambda^* (g(x^*,y^*) - c) = 0 $



$$\mathcal{L}(x^*,y^*,\lambda^*) = f(x^*,y^*) - 0 $$



$$ P^* = f(x^*,y^*)$$



Where $P^*$ is the specific condition where a stationary point exists and the constraint is met.



Let's now think of $c$ as a variable. When we do, the stationary points that satisfy the first order condition will change. Thus, our stationary points become a function of whatever we set $c$ to be.

$$ P^*(c) = f(x^*(c),y^*(c)) $$

The lagrange multiplier is the derivative of the function with respect to the constraint!

$$ \frac{dP^*}{dc} = \lambda^* $$

Wait, what?



$$\mathcal{L}(x,y,\lambda) = f(x,y) - \lambda (g(x,y) - c)$$



$$\frac{d\mathcal{L}(x,y,\lambda)}{dc} = \lambda$$



But recall, the points that matter most (the stationary points... where our extrema live) are all a function of $c$



$$\frac{d\mathcal{L}(x^*(c),y^*(c),\lambda^*(c),c)}{dc} = f(x^*(c),y^*(c)) - \lambda^*(c) (g(x^*(c),y^*(c)) - c)$$



This function is now effectively a vector-valued function! To solve calculate the derivative, we need to use the multivariate chain rule.

#### Sidenote: Multivariate Chain Rule $$ \textbf{s}(f_1(z),f_2(z)) = \begin{pmatrix} f_1(z) \\ f_2(z) \end{pmatrix}$$ What is the resulting change in $\textbf{s}( \cdot )$ when we nudge $z$ slightly?
$$ \frac{d\textbf{s}(f_1(z),f_2(z))}{dz} $$
$$ \lim_{h\to0}\frac{\textbf{s}(f_1(z-h),f_2(z)) - \textbf{s}(f_1(z),f_2(z))}{h} + \lim_{h\to0}\frac{\textbf{s}(f_1(z),f_2(z-h)) - \textbf{s}(f_1(z),f_2(z))}{h} $$
$$ \frac{\partial s}{\partial f_1 } \frac{\partial f_1}{\partial z} + \frac{\partial s}{\partial f_2 } \frac{\partial f_2}{\partial z} $$
Much like the univariate chain rule, we are "nudging through functions". What we care about is how a nudge in the input ultimately impacts the resulting output. This logic holds in multiple dimensions: we are nudging through functions in multiple dimensions. We nudge in each dimension independently (i.e. each unit vector) and then take the combination of all our nudges.



$$\frac{d\mathcal{L}(x^*(c),y^*(c),\lambda^*(c),c)}{dc} = \frac{d\mathcal{L}}{dx^*}\frac{dx^*}{dc} + \frac{d\mathcal{L}}{dy^*}\frac{dy^*}{dc} + \frac{d\mathcal{L}}{d\lambda^*}\frac{d\lambda^*}{dc} + \frac{d\mathcal{L}}{dc}\frac{dc}{dc}$$



Recall that the stationary points by definition only holds when the first derivative is zero, so we can do this...



$$ 0 + 0 + 0 + \frac{d\mathcal{L}}{dc}\frac{dc}{dc} $$



and $\frac{dc}{dc} = 1$ so



$$ \frac{d\mathcal{L}(x^*(c),y^*(c),\lambda^*(c),c)}{dc} =\frac{d\mathcal{L}}{dc}$$



And we know from above that



$$\frac{d\mathcal{L}}{dc} = \lambda $$



What does this tell us?

The Lagrange multiplier tells us the rate at which increases/decreases in the constraint parameter increases/decreases whatever we're trying to maximize/minimize. It is the constraint's marginal cost.