PPOLS564: Foundations of Data Science

Lecture 14

Inverting Matrices

Concepts For today:¶

(delve briefly into) Solving for systems of linear equations
Reduced Row Echelon Form as a Matrix Transformation
Invertability

import numpy as np

Solving for a System of Linear Equations¶

$$ x + y = 7 $$ $$ x + 2y = 11 $$

To solve for a system of equations, we must have as many equations as unknowns. The idea is to leverage the equations to isolate values for the unknowns.

Three potential outcomes:

We'll find many solutions (potentially an infinite number of solutions for potential values of $x$ and $y$
We'll find one solution.
We'll find no solution.

Essentially, we are trying to locate the point (or set of points) where these two lines intersect.

Note: The far deeper discussion on how solve a system of linear equations is covered in the reading.

Let's solve the above system.

\begin{equation} x + y = 7\\ x + 2y = 11 \end{equation}

\begin{equation} x = 7 - y \\ x + 2y = 11 \end{equation}

\begin{equation} x = 7 - y \\ (7 - y) + 2y = 11 \end{equation}

\begin{equation} x = 7 - y \\ y = 4 \end{equation}

\begin{equation} x = 7 - (4) \\ y = 4 \end{equation}

\begin{equation} x = 3 \\ y = 4 \end{equation}

Let's plug these values back in to get see if they work.

\begin{equation} (3) + (4) = 7\\ (3) + 2(4) = 11 \end{equation}

Looks good!

System of Linear Equations in Matrix Form¶

\begin{equation} 1x + 1y = 7\\ 1x + 2y = 11 \end{equation}

\begin{equation} \begin{bmatrix} 1 & 1\\ 1 & 2 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 7 \\ 11 \end{bmatrix} \end{equation}

\begin{equation} \textbf{A} \textbf{b} = \textbf{y} \end{equation}

Where:

$\textbf{A}$ can be thought of as our data
$\textbf{b}$ can be thought of as our unknown coefficients
$\textbf{y}$ can be thought of as our outcomes for which we are trying to solve for

We can express this matrix as an "augmented matrix" (which will help as we perform row-wise operations)

\begin{equation} \left| \begin{array}{cc|c} 1 & 1 & 7 \\ 1 & 2 & 11 \\ \end{array} \right| \end{equation}

Reduced Row Echelon Form (rref)¶

The goal of RREF is to use row-wise addition/subtraction and scaling to reduce each column so that 1 of the row entries equals one and the rest of the entries equal 0.

In essence, we want $\textbf{A}$ to resemble $\textbf{I}$.

\begin{equation} \begin{bmatrix} 1 & 1\\ 1 & 2 \end{bmatrix} \rightarrow \begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix} \end{equation}

Why might we want to do this?

Recall, if we think of matrix multiplication as a linear transformation (as we did last time), we'll remember that we're really fundamentally changing those unit vectors into a new coordinate system (e.g. $c_1\textbf{A}\hat{i} + c_2\textbf{A}\hat{j}$). The aim here is to reverse that process. That is, what steps do we need to take to go back to our original unit vectors ($\hat{i},\hat{j}$).

We can do this as we did before (i.e. when writing a function that would do the transformation for us) by performing the row-wise operations on both sides of the augmented matrix. The result will be the solution to our linear equation (if a solution exists).

Again, this is a simplification of rref and solving for systems of equations. Check out the reading for a more involved discussion. The point is to get the intuition of what is going on here.

\begin{equation} \left| \begin{array}{cc|c} 1 & 1 & 7 \\ 1 & 2 & 11 \\ \end{array} \right| \end{equation}

Hold the first row fixed. How do we get position (2,1) in the matrix to be zero? Subtract the first row from the second.

\begin{equation} \left| \begin{array}{cc|c} 1 & 1 & 7 \\ 1 - 1 & 2 - 1 & 11 - 7 \\ \end{array} \right| \end{equation}

\begin{equation} \left| \begin{array}{cc|c} 1 & 1 & 7 \\ 0 & 1 & 4 \\ \end{array} \right| \end{equation}

Holding the second row fixed. How do we get position (1,2) in the matrix to be zero? Subtract the second row from the first.

\begin{equation} \left| \begin{array}{cc|c} 1-0 & 1-1 & 7-4 \\ 0 & 1 & 4 \\ \end{array} \right| \end{equation}

\begin{equation} \left| \begin{array}{cc|c} 1 & 0 & 3 \\ 0 & 1 & 4 \\ \end{array} \right| \end{equation}

We've found our solution!

Let's check it.

\begin{equation} \begin{bmatrix} 1 & 1\\ 1 & 2 \end{bmatrix} \begin{bmatrix} 3 \\ 4 \end{bmatrix} = \begin{bmatrix} 1(3) + 1(4)\\ 1(3) + 2(4) \end{bmatrix} = \begin{bmatrix} 7 \\ 11 \end{bmatrix} \end{equation}

A = np.array([[1,1],[1,2]])
x = np.array([3,4])
b = A.dot(x)
b

array([ 7, 11])

RREF as a matrix transformation¶

We can encode these instructions to reduce the data down as a matrix transformation (just as we did when we first introduced matrices). We'll perform our instructions on the identity matrix, $\textbf{I}$, like we did before.

Again, our instructions (i.e. the steps we took above):

Hold the first row constant and subtract the first row from the second row

$$ f_1(x) = \begin{bmatrix} x_1\\x_2 - x_1\end{bmatrix} $$

Hold the second row constant and subtract the second row from the first row

$$ f_2(x) = \begin{bmatrix} x_1 - x_2 \\x_2\end{bmatrix} $$

Let's perform these operations on our identity matrix.

$$\textbf{I} = \begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix}$$

\begin{equation} f_1(\textbf{I}) = \begin{bmatrix} 1 & 0\\0-1 & 1-0\end{bmatrix} = \begin{bmatrix} 1 & 0\\-1 & 1\end{bmatrix} \end{equation}

\begin{equation} f_2(\begin{bmatrix} 1 & 0\\-1 & 1\end{bmatrix}) = \begin{bmatrix} 1-(-1) & 0-1 \\-1 & 1\end{bmatrix}= \begin{bmatrix} 2 & -1\\-1 & 1\end{bmatrix} \end{equation}

\begin{equation} \begin{bmatrix} 2 & -1\\-1 & 1\end{bmatrix} \end{equation}

What is this matrix? It's the inverse of matrix $\textbf{A}$!

$$\textbf{A} = \begin{bmatrix}1 & 1\\ 1 & 2\end{bmatrix}$$

$$\textbf{A}^{-1} = \begin{bmatrix} 2 & -1\\-1 & 1\end{bmatrix}$$

We can use this inverse transformation to take our vector back to where we started.

\begin{equation} \textbf{A}^{-1}\textbf{y} = \begin{bmatrix} 2 & -1\\-1 & 1\end{bmatrix} \begin{bmatrix}7\\11\end{bmatrix} = \begin{bmatrix}3\\4\end{bmatrix} = \textbf{b} \end{equation}

\begin{equation} \textbf{A}\textbf{x} = \begin{bmatrix}1 & 1\\ 1 & 2\end{bmatrix} \begin{bmatrix}3\\4\end{bmatrix} = \begin{bmatrix}7\\11\end{bmatrix} = \textbf{y} \end{equation}

We can think of the inverse transformation as numerical instructions to solve for a system of linear equations!

T1 = np.array([[1,0],[-1,1]])
print("Our first tranformation matrix\n")
print(T1)
T2 = np.array([[1,-1],[0,1]])
print("\nOur second tranformation matrix\n")
print(T2)

Our first tranformation matrix

[[ 1  0]
 [-1  1]]

Our second tranformation matrix

[[ 1 -1]
 [ 0  1]]

# together
A_inv = T2.dot(T1)

b

array([ 7, 11])

A_inv.dot(b)

array([3, 4])

And where $\textbf{A}$ dotted with its inverse $\textbf{A}^{-1}$ transforms us back to $\textbf{I}$

A_inv.dot(A)

array([[1, 0],
       [0, 1]])

Invertible Functions¶

Note (or recall) that we can only solve for a system where there is as many equations as there are unknowns.

We cannot solve this... \begin{equation} x + y - 3z = -10\\ x - y + 2z = 3 \end{equation}

But we could potentially solve this...

\begin{equation} x + y - 3z = -10\\ x - y + 2z = 3 \\ 2x + y - z = -6 \end{equation}

What does this mean for us in linear algebra land?

$$ \begin{bmatrix} 1 & 1 & -3 \\1 & -1 & 2 \\ 2 & 1 & -1\end{bmatrix} \begin{bmatrix} x\\y\\z\end{bmatrix} = \begin{bmatrix} -10\\2\\-6\end{bmatrix} $$

The matrix we are inverting must always be a square matrix. That is, the rank of the column space must be equal to the rank of the row space.

This is just a fancy way of saying that none of the column vectors are a linear combination of another column vector, i.e. they are linearly independent. And likewise with the row vectors.

$$ Rank(colspace(\textbf{A})) = Rank(rowspace(\textbf{A}))$$

$$ N~Cols = N~Rows$$

Another way to think about this, if a matrix transformation reduces a vector or matrix into a lower dimension (dimension reduction), then we can't walk back to where we started. Dimension reduction always results in a loss of information.

More generally, let's thinking about what it means to invert a function.

$$ f: x \mapsto y $$

An inverse function takes us back from our codomain $y$ to our original domain $x$

$$ f^{-1}: y \mapsto x $$

But we can only do this for a function that is surjective and injective.

surjective: there exists a mapping for every value of set $\textbf{x}$ "onto" set $\textbf{y}$. That is, every value from one set maps onto a value in the other set.

Surjective!

        X    Y
        -    -
        a => z
        b => y
        c => x
        d => x

Not Surjective
        X    Y
        -    -
        a => z
        b => y
        c => x
        d

injective: there exists a "one-to-one" mapping of values in set $\textbf{x}$ onto set $\textbf{y}$. That is. there exists a unique mapping for each $\textbf{x}$ onto $\textbf{y}$

Injective!

        X    Y
        -    -
        a => z
        b => y
        c => x
        d => w

Not injective
        X    Y
        -    -
        a => z
        b => y
        c => x
        d => x

The idea is that every value of $x$ maps onto a unique value of $y$. If we don't have sufficient information, that is, if we don't have equal number of dimensions, then the function wouldn't be surjective and we can't invert.

What to do if our matrix isn't square?¶

Recall from the last lecture that we can always generate a square matrix by projecting it back onto itself by squaring it.

B = np.random.randn(5,3)
B.shape

(5, 3)

B.dot(B.T).shape

(5, 5)

B.T.dot(B).shape

(3, 3)

How to determine if a matrix is invertible¶

Recall above that we encoded instructions regarding how to convert a $2 \times 2$ matrix into RREF.

Let's follow those steps again but this time on a more general representation of the matrix.

\begin{equation} \textbf{A} = \begin{bmatrix} a & b \\ c & d \end{bmatrix} = \begin{bmatrix} R_1 \\ R_2 \end{bmatrix} \end{equation}

Let's put it in augmented matrix form, and perform our row-wise manipulations simultaneously on $\textbf{I}$

\begin{equation} \left| \begin{array}{cc|cc} a & b & 1 & 0 \\ c & d & 0 & 1 \\ \end{array} \right| \end{equation}

Transformation 1:

\begin{equation} \textbf{T}_1 = \begin{bmatrix} R_1 \\ aR_2 - cR_1 \end{bmatrix} \end{equation}

\begin{equation} \left| \begin{array}{cc|cc} a & b & 1 & 0 \\ 0 & ad - bc & -c & a \\ \end{array} \right| \end{equation}

Transformation 2:

\begin{equation} \textbf{T}_2 = \begin{bmatrix} (ad-bc)R_1 - bR_2 \\ R_2 \end{bmatrix} \end{equation}

\begin{equation} \left| \begin{array}{cc|cc} a(ad-bc) & 0 & ad & -ab \\ 0 & ad - bc & -c & a \\ \end{array} \right| \end{equation}

Transformation 3: ensure the diagonals equal 1

\begin{equation} \textbf{T}_3 = \begin{bmatrix} \frac{R_1}{a(ad-bc)} \\ \frac{R_2}{(ad-bc)} \end{bmatrix} \end{equation}

\begin{equation} \left| \begin{array}{cc|cc} 1 & 0 & \frac{d}{(ad-bc)} & \frac{-b}{(ad-bc)} \\ 0 & 1 & \frac{-c}{(ad-bc)} & \frac{a}{(ad-bc)} \\ \end{array} \right| \end{equation}

This yields the formula for the inverse of a $2 \times 2 $ matrix.

\begin{equation} \frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix} \end{equation}

What is $ad-bc$?

The Determinant¶

$$ det(\textbf{A}) = |\textbf{A}| = ad-bc $$

The determinant of matrix $\textbf{A}$ tells us if a (square) matrix is invertible. When we examine the above equation, it's obvious why this is. A fraction with a denominator of $0$ is undefined, meaning we can't solve it.

But what does this actually mean?

It means that the vectors composing the square matrix are not linearly independent.

A = np.array([[1,2],[2,4]])
A

array([[1, 2],
       [2, 4]])

np.linalg.det(A)

0.0

To give a better intuition of what is going on, think of the determinant as the area of a square (when in $\Re^2$) generated by our two basis vectors. When we transform these vectors, that area grows and shrinks. When that area goes to zero, it means that we've collapsed to a lower dimension (i.e. down to a line if we were in $\Re^2$). Thus, the determinant tells us if there is sufficient information in the matrix to take us back to where we started (i.e. if the column and row space of our square matrix are actually linearly independent).

Here is a great video that outlines how we can think of the determinant of a matrix in $\Re^2$ as the area of a square.

Note that when the determinant of a matrix is 0 we call it singular.

NOTE: that finding the determinant for an $n \times n$ matrix is more involved! See the reading for a deeper discussion on this. We'll be relying on our computers to compute these values, but it's useful to have a deeper understanding of the steps.

Which of these matrices are invertible?¶

X = np.array([[14,-7],[2,-1]])
Y = np.random.randn(4,5)
Z = np.array([[0,0,0,0],
              [1,0,0,1],
              [0,1,0,0]])
print(X,"\n")
print(Y,"\n")
print(Z,"\n")

[[14 -7]
 [ 2 -1]] 

[[-1.31692334 -0.44108299  0.56100078  0.93984351  2.24174342]
 [-0.62457184 -0.16588639  0.14610647 -1.78707646 -0.05134602]
 [ 0.57006418 -0.31545966  0.03206903  1.0371645   0.23118344]
 [-1.00540481  1.65994245 -0.26802636  0.90013057 -0.13122613]] 

[[0 0 0 0]
 [1 0 0 1]
 [0 1 0 0]]