Why does SVD provide the least squares and least norm solution to $ A x = b $?

First, consider the problem $\Sigma x = b$, where $$ \Sigma = \pmatrix{\sigma_1\\& \ddots\\&&\sigma_r\\ &&&0\\&&&&\ddots\\&&&&&0} $$ Note that $b$ is only in the range of $\Sigma$ if its entries $b_{r+1},\dots,b_n$ are all zero. Furthermore, you should be able to convince yourself (geometrically or otherwise) that the least squares solution must be $$ x = (b_1/\sigma_1,\dots,b_r/\sigma_r,0,\dots,0)^T = \Sigma^+ b $$ From there, note that $$ U\Sigma V^T x = b \implies\\ \Sigma (V^T x ) = U^T b $$ By the above argument, the least squares solution for $(V^T x)$ is given by $V^T x = \Sigma^+ U^T b$. Noting that $\|V^T x\| = \|x\|$, we can use this to conclude that $x = (V \Sigma ^+ U^T)b$ must be the least squares solution (for $x$).

I hope you find this explanation sufficient.


The pseudoinverse solution from the SVD is derived in proving standard least square problem with SVD.

Given $\mathbf{A}x=b$, where the data vector $b\notin\mathcal{N}\left( \mathbf{A}^{*} \right)$, the least squares solution exists and is given by $$ x_{LS} = \color{blue}{\mathbf{A}^{\dagger}b} + \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger}\mathbf{A}\right) y}, \quad y\in\mathbb{C}^{n} $$ where blue vectors are in the range space $\color{blue}{\mathcal{R}\left( \mathbf{A}^{*} \right)}$ and red vectors are in the null space $\color{red}{\mathcal{N}\left( \mathbf{A} \right)}.$ The least squares solution $r^{2}$ minimizes the sum of the squares of the residual errors and is an affine space. That is $$ \lVert \mathbf{A} x_{LS} (y) \rVert_{2}^{2} = r^{2}_{min} $$ for all values of $y$.

What is the vector in this affine space with the smallest length? The length of the solution vectors is $$ \lVert x_{LS} \rVert_{2}^{2} = \lVert \color{blue}{\mathbf{A}^{\dagger}b} + \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger}\mathbf{A}\right) y} \rVert_{2}^{2} = \lVert \color{blue}{\mathbf{A}^{\dagger}b} \rVert_{2}^{2} + \underbrace{\lVert \color{red}{\left( \mathbf{I}_{n} - \mathbf{A}^{\dagger}\mathbf{A}\right) y} \rVert_{2}^{2}}_{y=\mathbf{0}} $$ The solution vector of minimum length is $\color{blue}{\mathbf{A}^{\dagger}b}$, the point in the affine space closest to the origin.


The resource linked below really helped me understand this. The transformation $A$ can be interpreted in 2D as mapping the unit circle to an elipse. This can be done in a 3 step process using the SVD:

  1. Rotate the unit circle so it can be stretched along its axis
  2. Stretch each axis to form the ellipse
  3. Rotate again to align the ellipse with the output space of $A$

To solve for $x$, you reverse this process, starting with $b$.

Least squares comes in when step 2 creates a ellipse with a width of zero. When you're going through this process in reverse, when you get to step 2, un-stretching throws away that dimension with a width of zero. Still, you're left with the closest point to $b$ in the output space of $A$. Continuing through the reversed process gets you to $x'$.

In other words, the transformation $A$ maps the unit circle to a line instead of an ellipse, and you've found the $x$ for which $Ax$ results in the closest point on that line to point $b$.

https://www.cs.cornell.edu/courses/cs3220/2010sp/notes/svd.pdf