Difference between least squares and minimum norm solution

Linear system $$ \mathbf{A} x = b $$ where $\mathbf{A}\in\mathbb{C}^{m\times n}_{\rho}$, and the data vector $b\in\mathbf{C}^{n}$.

Least squares problem

Provided that $b\notin\color{red}{\mathcal{N}\left( \mathbf{A}^{*}\right)}$, a least squares solution exists and is defined by $$ x_{LS} = \left\{ x\in\mathbb{C}^{n} \colon \lVert \mathbf{A} x - b \rVert_{2}^{2} \text{ is minimized} \right\} $$

Least squares solution

The minimizers are the affine set computed by $$ x_{LS} = \color{blue}{\mathbf{A}^{+} b} + \color{red}{ \left( \mathbf{I}_{n} - \mathbf{A}^{+} \mathbf{A} \right) y}, \quad y \in \mathbb{C}^{n} \tag{1} $$ where vectors are colored according to whether they reside in a $\color{blue}{range}$ space or $\color{red}{null}$ space.

The red dashed line is the set of the least squares minimizers.

hyperplane

Least squares solution of minimum norm

To find the minimizers of the minimum norm, the shortest solution vector, compute the length of the solution vectors.

$$ % \lVert x_{LS} \rVert_{2}^{2} = % \Big\lVert \color{blue}{\mathbf{A}^{+} b} + \color{red}{ \left( \mathbf{I}_{n} - \mathbf{A}^{+} \mathbf{A} \right) y} \Big\rVert_{2}^{2} % = % \Big\lVert \color{blue}{\mathbf{A}^{+} b} \Big\rVert_{2}^{2} + \Big\lVert \color{red}{ \left( \mathbf{I}_{n} - \mathbf{A}^{+} \mathbf{A} \right) y} \Big\rVert_{2}^{2} % $$ The $\color{blue}{range}$ space component is fixed, but we can control the $\color{red}{null}$ space vector. In fact, chose the vector $y$ which forces this term to $0$.

Therefore, the least squares solution of minimum norm is $$ \color{blue}{x_{LS}} = \color{blue}{\mathbf{A}^{+} b}. $$ This is the point where the red dashed line punctures the blue plane. The least squares solution of minimum length is the point in $\color{blue}{\mathcal{R}\left( \mathbf{A}^{*}\right)}$.

Full column rank

You ask about the case of full column rank where $n=\rho$. In this case, $$ \color{red}{\mathcal{N}\left( \mathbf{A} \right)} = \left\{ \mathbf{0} \right\}, $$ the null space is trivial. There is no null space component, and the least squares solution is a point.

In other words, $$ \color{blue}{x_{LS}} = \color{blue}{\mathbf{A}^{+} b} $$ is always the least squares solution of minimum norm. When the matrix has full column rank, there is no other component to the solution. When the matrix is column rank deficient, the least squares solution is a line.


First, it's important to understand that there are different norms. Most likely you're interested in the euclidean norm:

$\| x \|_{2} =\sqrt{\sum_{i=1}^{n}x_{i}^{2}}$

Next, note that minimizing $\| b-Ax \|_{2}^{2}$ is equivalent to minimizing $\| b-Ax \|_{2}$, because squaring the norm is a monotone transform. It really doesn't matter which one you minimize.

If $A$ has full column rank, then there is a unique least squares solution.

However, if $A$ doesn't have full column rank, there may be infinitely many least squares solutions. In this case, we're often interested in the minimum norm least squares solution. That is, among the infinitely many least squares solutions, pick out the least squares solution with the smallest $\| x \|_{2}$. The minimum norm least squares solution is always unique. It can be found using the singular value decomposition and/or the Moore-Penrose pseudoinverse.


If a system is overdetermined, there is no solution and thus we may want to find $x$ such that $||Ax-b||$ is as small as it can be (as there is no way to make $||Ax-b||=0$).

On the other hand, if the system is underdetermined, there are infinitely many solutions and thus one can find a solution of minimal norm and this is called the minimum norm solution.