What is step by step logic of pinv (pseudoinverse)?

  1. Perform a singular value decomposition $\mathbf A=\mathbf U\mathbf \Sigma\mathbf V^\top$

  2. Check for "tiny" singular values in $\mathbf \Sigma$. A common criterion is that anything less than $\|\mathbf A\|\epsilon$ is "tiny". Set those singular values to zero.

  3. Form $\mathbf \Sigma^+=\mathrm{diag}(1/\sigma_1,1/\sigma_2,\dots, 0)$. That is, reciprocate the nonzero singular values, and leave the zero ones untouched.

  4. $\mathbf A^+=\mathbf V\mathbf \Sigma^+\mathbf U^\top$


As far as I know and from what I have read, there is no direct formula (or algorithm) for the (left) psuedo-inverse of $A$, other than the "famous formula" $$(A^\top A)^{-1}A^\top$$ which is computationally expensive. I will describe here an algorithm that does indeed calculate it efficiently. As I said, I believe this may be previously un-published, so cite me appropriately, or if you know of a reference please state it. Here is my algorithm.

Set $B_0 = A$ and for each iteration step, take a column of $B_i$ and orthogonalize against the columns of $A$. Here is the algorithm ($A$ has $n$ independent columns):

1. Initialize $B_0=A$.
2. For $j \in 1\ldots n$ do \begin{align} \mathbf{t} &= B_i^\top A_{\star j} \\ R\mathbf{t} &= e_j & \text{Find such $R$, an elementary row operation matrix}\\ B_{i+1}^\top &= R B_i^\top \\ \end{align}

Notice that each step does one vector/matrix multiplication and one elementary matrix row operation. This will require much less computation than the direct evaluation of $(A^\top A)^{-1}A^\top$. Notice also that there is a nice opportunity here for parallel computation. (One more thing to notice--this may calculate a regular inverse, starting with $B_0=A$, or with $B_0=I$.)

The step that calculates $R$ may partially be orthogonal rather than elementary (orthonormal/unitary -thus better in the sense of error propagation) if desired, but should not make use of the previously orthonormalized rows of $B$, since they must remain orthogonalized in the process. If this is done, the process becomes a "cousin" to the QR algorithm, whereas before it would be considered a "cousin" to the LU factorization. "Cousin" because the matrix operations of those are self referential, and this algorithm references the matrix $A$. The following is a hopefully enlightening description.


See first edits for a more wordy description. But I think the following is more concise and to the point.

Consider the conformable matrix $B^\top$ such that $B$ has the same dimension as $A$ and \begin{align} B^\top A &= I \tag{1}\\ B &= A B_A \tag{2}\\ \end{align}

Here $B_A$ is arbitrary and exists only to ensure that the matix $B$ shares the same column space as $A$. (1) states that $B^\top$ is a (not necessarily unique) left inverse for $A$. (2) states that $B$ shares the same column space as $A$.

Claim: $B^\top$ is the Moore-Penrose pseudoinverse for $A$.

Proof:

The Moore-Penrose pseudoinverse for $A$ is $$A^\dagger = \left(A^\top A\right)^{-1}A^\top$$ Given (1) and substituting (2) we have \begin{align} \left(AB_A\right)^\top A &= I \\ B_A^\top A^\top A &= I \\ B_A^\top &= \left( A^\top A\right)^{-1} \\ B_A &= \left( A^\top A\right)^{-1} \\ \end{align} Now solving for $B$ from (2): \begin{align} B &= A\left(B_A\right) \\ B &= A\left( A^\top A\right)^{-1} \\ \Rightarrow B^\top &=\left( A^\top A\right)^{-1}A^\top \\ \end{align}

Q.E.D.


How to we use the elements of $\mathbf{A}$ to construct the pseudoinverse $\mathbf{A}^{\dagger}$?

Anatomy of the SVD

The target matrix has $m$ rows, $n$ columns, and has rank $\rho$ $$ \mathbf{A} \in \mathbb{C}^{m\times n}_{\rho} $$

Define the singular value decomposition: $$ \begin{align} \mathbf{A} &= \mathbf{U} \, \Sigma \, \mathbf{V}^{*} \\ % &= % U \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}}} & \color{red}{\mathbf{U}_{\mathcal{N}}} \end{array} \right] % Sigma \left[ \begin{array}{cccc|cc} \sigma_{1} & 0 & \dots & & & \dots & 0 \\ 0 & \sigma_{2} \\ \vdots && \ddots \\ & & & \sigma_{\rho} \\\hline & & & & 0 & \\ \vdots &&&&&\ddots \\ 0 & & & & & & 0 \\ \end{array} \right] % V \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \\ \color{red}{\mathbf{V}_{\mathcal{N}}}^{*} \end{array} \right] \\ % & = % U \left[ \begin{array}{cccccccc} \color{blue}{u_{1}} & \dots & \color{blue}{u_{\rho}} & \color{red}{u_{\rho+1}} & \dots & \color{red}{u_{n}} \end{array} \right] % Sigma \left[ \begin{array}{cc} \mathbf{S}_{\rho\times \rho} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{array} \right] % V \left[ \begin{array}{c} \color{blue}{v_{1}^{*}} \\ \vdots \\ \color{blue}{v_{\rho}^{*}} \\ \color{red}{v_{\rho+1}^{*}} \\ \vdots \\ \color{red}{v_{n}^{*}} \end{array} \right] % \end{align} $$ Coloring separates $\color{blue}{range}$ space entities from $\color{red}{null}$ space entities.

Using the thin, or economical, or nullspace free variants, we have $$ \mathbf{A} = \color{blue}{\mathbf{U}_{\mathcal{R}}} \, \mathbf{S} \, \color{blue}{\mathbf{V}^{*}_{\mathcal{R}}} \qquad \Rightarrow \qquad \mathbf{A}^{\dagger} = \color{blue}{ \mathbf{V}_{\mathcal{R}} } \, \mathbf{S}^{-1} \, \color{blue}%{\mathbf{U}^{*}_{\mathcal{R}}}. $$

Construction of the SVD

  1. Form the product matrix $\mathbf{A}^{*}\mathbf{A}$.

  2. Solve for the eigenvalues $\lambda \left( \mathbf{A}^{*}\mathbf{A}\right)$. This is a messy affair because it demands finding the roots of the characteristic equation which is the numerical challenge of solving for the zeros of an arbitrary polynomial.

  3. Order the nonzero values to assemble the matrix of singular values $\mathbf{S}=\text{diagonal}\left( \sigma \right)$ where the singular value spectrum is $$ \sigma = \sqrt{\lambda \left( \mathbf{A}^{*}\mathbf{A} \right)} $$ with $$ \sigma_{1} \ge \sigma_{2} \ge \dots \sigma_{\rho} > 0. $$ A problem immediately arises over the value of $\rho$. Are small numbers numerical artifacts? The issue is vital because the contribution to the pseudoinverse is weighted by the reciprocal of the singular value. Very small number make very large contributions.

  4. Compute the eigenvectors $\hat{u}_{k}$, $k=1,\rho$ for each eigenvalue: $$ \mathbf{A}^{*}\mathbf{A}\, \hat{u}_{k} = \lambda_{k}\,\hat{u}_{k} $$ This is the origin of the sign ambiguity in the SVD.

  5. Normalize these vectors and make them the column vectors of the codomain matrix $\color{blue}{\mathbf{U}^{*}_{\mathcal{R}}}$: $$ \color{blue}{\mathbf{U}^{*}_{\mathcal{R}}} = \left[ \begin{array}{cccc} \color{blue}{\frac{ \hat{u}_{1} } {\lVert \hat{u}_{1} \rVert}} & \color{blue}{\frac{ \hat{u}_{2} } {\lVert \hat{u}_{2} \rVert}} & \dots & \color{blue}{\frac{ \hat{u}_{\rho} } {\lVert \hat{u}_{\rho} \rVert}} \end{array} \right] $$

  6. Optional: Want $\color{red}{\mathbf{U}_{\mathcal{N}}}$? Apply Gram-Schmidt orthogonalization to the column vectors. The orientation is a free choice and a reason while different programs create different nullspace spans. If you choose this option, you need to pack $\mathbf{S}$ into a sabot matrix $\Sigma$ and compute the companion nullspace for $\mathbf{V}$.

  7. With $\mathbf{S}$ and $\color{blue}{\mathbf{U}^{*}_{\mathcal{R}}}$ in hand, construct the column vectors of $\color{blue}{\mathbf{V}^{*}_{\mathcal{R}}}$ using $$ \left[ \color{blue}{\mathbf{V}_{\mathcal{R}}} \right]_{k} = \sigma_{k}^{-1} \left[ \color{blue}{\mathbf{U}^{*}_{\mathcal{R}}} \mathbf{A} \right]_{k} $$ Notice the reciprocal of the singular value.

  8. Optional: Want $\color{red}{\mathbf{V}_{\mathcal{N}}}$? Apply Gram-Schmidt orthogonalization to the column vectors of $\color{red}{\mathbf{V}_{\mathcal{N}}}$.

  9. Invert the diagonal matrix $\mathbf{S}$: $$ \mathbf{S} = \left[ \begin{array}{ccc} \sigma_{1} \\ & \sigma_{2} \\ && \ddots \\ &&& \sigma_{\rho} \end{array} \right] % \qquad \Rightarrow \qquad % \mathbf{S}^{-1} = \left[ \begin{array}{ccc} \frac{1}{\sigma_{1}} \\ & \frac{1}{\sigma_{2}} \\ && \ddots \\ &&& \frac{1}{\sigma_{\rho}} \end{array} \right] $$ Again, the reciprocal of the singular values.

  10. Form the Hermitian conjugate by taking the transpose of the conjugate or the conjugate of the transpose: $$ \color{blue}{\mathbf{V}^{*}_{\mathcal{R}}} = \color{blue}{\overline{\mathbf{V}}_{\mathcal{R}}^{\mathrm{T}}} = \color{blue}{\overline{\mathbf{V}^{\mathrm{T}}_{\mathcal{R}}}} $$

Construction of the pseudoinverse

  1. Assemble $$ \mathbf{A}^{\dagger} = \color{blue}{\mathbf{V}_{\mathcal{R}}} \, \mathbf{S}^{-1} \, \color{blue}{\mathbf{U}^{*}_{\mathcal{R}}} = \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}}} & \color{red}{\mathbf{U}_{\mathcal{N}}} \end{array} \right] % Sigma \left[ \begin{array}{cccccc} \mathbf{S}^{-1} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{array} \right] % V \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \\ \color{red}{\mathbf{V}_{\mathcal{N}}}^{*} \end{array} \right] $$