
Note that the gradient vectors are the point of tangency are oriented along the same direction.
$$ \nabla f(x,y) = \lambda \nabla g(x,y) $$
The Lagrange multiplier is the proportion of the two gradients. Put differently, the gradients are proportionally equal by some scaling factor $\lambda$, which is the lagrange multiplier.
Let's unpack this further.
Recall $f(x,y)$ is our objective function and $g(x,y) = c$ is our equality constraint, where $c$ is some constant.
$$\mathcal{L}(x,y,\lambda) = f(x,y) - \lambda (g(x,y) - c)$$
$$ \nabla \mathcal{L}(x^*,y^*,\lambda^*) = \textbf{0} $$
Note that when the constraint is met, $\lambda^* (g(x^*,y^*) - c) = 0 $
$$\mathcal{L}(x^*,y^*,\lambda^*) = f(x^*,y^*) - 0 $$
$$ P^* = f(x^*,y^*)$$
Where $P^*$ is the specific condition where a stationary point exists and the constraint is met.
Let's now think of $c$ as a variable. When we do, the stationary points that satisfy the first order condition will change. Thus, our stationary points become a function of whatever we set $c$ to be.
$$ P^*(c) = f(x^*(c),y^*(c)) $$
The lagrange multiplier is the derivative of the function with respect to the constraint!
$$ \frac{dP^*}{dc} = \lambda^* $$
$$\mathcal{L}(x,y,\lambda) = f(x,y) - \lambda (g(x,y) - c)$$
$$\frac{d\mathcal{L}(x,y,\lambda)}{dc} = \lambda$$
But recall, the points that matter most (the stationary points... where our extrema live) are all a function of $c$
$$\frac{d\mathcal{L}(x^*(c),y^*(c),\lambda^*(c),c)}{dc} = f(x^*(c),y^*(c)) - \lambda^*(c) (g(x^*(c),y^*(c)) - c)$$
This function is now effectively a vector-valued function! To solve calculate the derivative, we need to use the multivariate chain rule.
$$\frac{d\mathcal{L}(x^*(c),y^*(c),\lambda^*(c),c)}{dc} = \frac{d\mathcal{L}}{dx^*}\frac{dx^*}{dc} + \frac{d\mathcal{L}}{dy^*}\frac{dy^*}{dc} + \frac{d\mathcal{L}}{d\lambda^*}\frac{d\lambda^*}{dc} + \frac{d\mathcal{L}}{dc}\frac{dc}{dc}$$
Recall that the stationary points by definition only holds when the first derivative is zero, so we can do this...
$$ 0 + 0 + 0 + \frac{d\mathcal{L}}{dc}\frac{dc}{dc} $$
and $\frac{dc}{dc} = 1$ so
$$ \frac{d\mathcal{L}(x^*(c),y^*(c),\lambda^*(c),c)}{dc} =\frac{d\mathcal{L}}{dc}$$
And we know from above that
$$\frac{d\mathcal{L}}{dc} = \lambda $$
What does this tell us?
The Lagrange multiplier tells us the rate at which increases/decreases in the constraint parameter increases/decreases whatever we're trying to maximize/minimize. It is the constraint's marginal cost.