How to approximate a solution to a matrix equation?

I want to correct something in both Anton's ans las3rjock's answers. I hope that I am merely correcting bad notation.

The way to find x' (using the notation in the question) is, as correctly stated in each of the answers above, to derive the associated problem A^T Ax' = A^T b and solve that via Gaussian Elimination (also called Gauss-Jordan Elimination, the difference is technical and not important here). This is guaranteed to have a solution.

The fact that this is the correct solution to the problem relies on the properties of the "closest point" of a point to a subspace. We want the closest point to b on the subspace Im A. Call this b'. The properties of the "closest point" imply that the difference, b - b', is orthogonal to everything in Im A. It's simple to draw a picture to convince yourself that if c is in Im A and b - c is not orthogonal to everything in Im A then it is possible to "nudge" c a little, either away or towards the origin, to c' so that b - c' is shorter than b - c.

So b - b' is orthogonal to everything in Im A. Since being orthogonal to something is a linear condition, it is sufficient to check that b - b' is orthogonal to a spanning set for Im A, for which we can take the columns of A. As we are using the standard inner product, this means that for each column, say a of A, a^T(b - b') = 0. Putting these together, we obtain the relation A^T(b - b') = 0. As b' is in the image of A, there is some x' such that b' = A x', whence we see that x' satisfies A^Tb - A^TA x' = 0, and get the desired formula on rearranging. This also guarantees the existence of a solution to this equation.

Using the fact that the closest point is the unique point b' in Im A such that b - b' is orthogonal to everything in Im A, we can run this argument backwards to see that if x' is a solution of A^T A x' = A^T b then Ax' is the closest point to b in Im A.

Where the above answers go wrong is to then talk about the matrix (A^T A)^-1 A^T. The problem with this is that A^TA may not be invertible (take A to be the 2 by 3 zero matrix). There is a matrix which when A^TA is invertible is (A^TA)^-1A^T and this is called the pseudo-inverse. Essentially, A^T misses the kernel of A^TA meaning that the composition is always well-defined but it might not be decomposable as the notation suggests.

This notation may be standard, of that I don't know, but if it is then it is bad notation because it suggests a property that may not hold. At the least, it should always carry a rider to make clear that the notation is merely suggestive and not to be taken literally.

Edit: Andrew is absolutely correct that using (A^TA)^-1 is at best sloppy (in my case just a flat-out error) because A^TA may not be invertible. If A^TA is not invertible, it is because there is a linear dependence among the columns of A. In this case, you can remove some of the columns without changing the image until they are linearly independent. The explanation I provide below works once you've removed these "extra" columns.

Anon's and las3rjock's answers are correct: the x' you're looking for is (A^TA)^-1A^Tb. But I wanted to add a neat explanation I've heard for why this produces the correct answer.

The image of A, vectors of the form Ax, are all linear combination of the columns of A (the entries of x being the coefficients in the linear combination). So if no solution to the equation Ax=b exists, finding the "best approximate solution" amounts to asking for is the projection of b onto the image of A.

Claim: The operator A(A^TA)^-1A^T is orthogonal projection onto the image of A.

Proof: (1) This operator is equal to its square, so it is a projection:
A(A^TA)^-1A^T A(A^TA)^-1A^T = A(A^TA)^-1A^T.

(2) For any vector Ax in the image of A, we have (A(A^TA)^-1A^T) Ax = Ax, so the image of this projection is equal to the image of A.

(3) The projection is orthogonal. Let me denote inner product by <-,->. Suppose y is a vector orthogonal to Ax for all x, so <y,Ax>=0 for all x, then we need to show that A(A^TA)^-1A^Ty=0. For any vector z, we have (by the defining property of transpose)
<A(A^TA)^-1A^Ty,z> = <y,A (B^Tz)> = 0
where B=(A^TA)^-1A^T. Since A(A^TA)^-1A^Ty dots to zero with every z, it must be zero.

So if b is not in the image of A, the closest vector that is in the image of A is b' = A(A^TA)^-1A^Tb, which is clearly A applied to x'=(A^TA)^-1A^Tb

Andrew has a correct answer in the pseudoinverse $A^+$, which is characterized by the property that $x = A^+b$ is the shortest vector that solves $A^TAx = A^Tb$ (equivalently, $x$ has zero nullspace(A) component). Computationally, it is typically found using a singular value decomposition: if

$\displaystyle A = U \Sigma V^T = \left[ \begin{array}{cc} U_{col} & U_{null} \end{array} \right] \left[ \begin{array}{cc} \Sigma_{pos} & 0 \\\ 0 & 0 \end{array} \right] \left[ \begin{array}{cc} V_{row} & V_{null} \end{array} \right]^T,$

then the pseudoinverse is $A^+ = V_{row} \Sigma_{pos}^{-1} U_{col}^T$.

I'd like to mention that the pseudoinverse is both rather computationally expensive and unstable in the presence of noise. In particular, if $b$ is given by taking real-world data with limited accuracy, and $A$ is ill-conditioned (e.g., singular), the output of least-squares can vary wildly with the error. One common approach to rectify this is Tikhonov regularization, which typically means minimizing $\Vert Ax - b \Vert^2 + \alpha \Vert x \Vert^2$ for some small $\alpha$. This generically yields a nonsingular optimization problem, which can be computed quickly by Gaussian elimination, and as $\alpha$ approaches zero, the solution approaches the pseudoinverse solution. It will not in general yield an exact solution, but there are error-minimizing heuristics (e.g., using the Discrepancy Principle) for choosing $\alpha$ based on knowledge about the size the noise.

Reference: Strang, Computational Science and Engineering

How to approximate a solution to a matrix equation?

Tags:

Linear Algebra

Matrices

Approximation Theory

Approximation Algorithms

Related

Recent Posts