Understanding a derivation of the SVD

if $$\max_{\|v\|=1} \|A v\|$$ has $v_1$ as solution, then for every $v \perp v_1$ : $Av \perp A v_1$.

suppose by contradiction that it exists $v_2 \perp v_1$ such that $Av_2 = c A v_1 + u_2$ where $c\ne 0$ and $u_2 \perp A v_1$. then $v_1$ can't be a maximiser of $\max_{\|v\|=1} \|A v\|$ :

let $w(\epsilon) = \sqrt{1-\epsilon^2} v_1 + \epsilon v_2$, hence $\|w(\epsilon)\| = 1$, and $$A w(\epsilon) = \sqrt{1-\epsilon^2} A v_1 + \epsilon A v_2 = (\sqrt{1-\epsilon^2} + \epsilon c) A v_1 + \epsilon u_2$$

i.e. : $$\|A w(\epsilon)\|^2 = \underbrace{|\sqrt{1-\epsilon^2} + \epsilon c|^2}_{>\ 1 \ \text{if } \ \epsilon \ \text{ is small enough }} \|A v_1\|^2+ \epsilon^2 \|u_2\|^2 $$

this is enough to prove the SVD of matrices, since we can repeatedly compute $\max_{\|v\|=1} \|A_k v\|$ on $A_1 = A$ and then on $A_{k+1} = A_{k} - A_{k}v_{k} v_{k}^T$ where $v_{k}$ is the maximiser of the previous maximisation, and hence this is enough to prove the spectral theorem too.

The shortcut that you might be inclined to take would be to try to prove that if $x,y$ are orthogonal then $Ax,Ay$ are orthogonal. But this is not the case, and in fact it is very far from being the case. It is not even the case when $A$ is symmetric positive definite, or even just diagonal with positive entries.

To proceed correctly you have to bring in the adjoint somehow. A way to do this is to note that $\| Ax \|^2_2 = \langle Ax,Ax \rangle = \langle x,A^T A x \rangle$ (the transpose is the adjoint for the Euclidean inner product). So the norm is maximized when you maximize this inner product. Cauchy-Schwarz tells us that it is maximized by the eigenvector of $A^T A$ with largest eigenvalue. This is your $v_1$, and its image is your $u_1$.

Now you take the orthogonal complement of $v_1$ when you go to define $v_2$. I don't see a way to conveniently work with this orthogonal complement without proving that it is invariant under $A^T A$, because otherwise $A^T A$ might not have any eigenvectors in there. (For example, with $B=\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}$, this construction works to find the eigenvector $\begin{bmatrix} 1 \\ 0 \end{bmatrix}$, but then the orthogonal complement of that is not invariant under $B$, and indeed $B$ doesn't have any other eigenvectors.) But once you've proven that, you've almost proven the spectral theorem, so you've gotten pretty far from the simple picture that you started with.

Here's my attempt at an intuitive explanation of the fact that $u_1$ and $u_2$ are guaranteed to be orthogonal, based on the answer given by @user1952009. Throughout this answer, $\| \cdot \|$ will denote the $\ell_2$-norm.

Assume that $v_1$ is a maximizer of $\|Av\|$ subject to the constraint that $\|v\| = 1$. Assume also that $v_2 \perp v_1$.

Claim: Under these assumptions, $A v_2 \perp A v_1$.

Explanation: It's possible to look at this in a way that makes it intuitive or even "obvious". If $Av_2$ were not orthogonal to $A v_1$, then it seems like we could improve $v_1$ by adding $\epsilon v_2$ to it, for a sufficiently tiny value of $\epsilon$. When we perturb $v_1$ a tiny bit in the direction of $v_2$, then the norm of $v_1$ does not change (to first order, at least). [A satellite in circular orbit moves locally in a straight line, and its distance from the center of the Earth is constant.] However, we cannot say the same for the norm of $Av_1$, because $A v_1$ is perturbed in the direction of $A v_2$, and $A v_1$ and $A v_2$ are not orthogonal. The growth in $\| A v_1 \|$ is non-negligible.

Again: when $v_1$ is perturbed in the direction $v_2$, the change in norm is negligible (so the norm is still $1$). But, $A v_1$ is perturbed in the direction $A v_2$, and the change in norm is non-negligible (so the norm can increase).

Suppose you're standing 1 kilometer from the origin and you want to take a step in order to increase the magnitude of your displacement vector from the origin. In which direction should you move? Is it better to move in a direction orthogonal to your displacement vector, or parallel to it? If you step in a direction orthogonal to your displacement vector, then the change in the magnitude of your displacement vector is negligible. However, if you step in a direction parallel to your displacement vector, then the change in magnitude of your displacement vector is significant.

Finally, let's convert this intuition into a rigorous proof. To get a rigorous proof, we have to face the fact that $v_1 + \epsilon v_2$ does not actually have norm $1$ when $\epsilon \neq 0$, even if $\epsilon$ is tiny. We can fix this by taking our perturbed version of $v_1$ to be \begin{equation} \tag{$\spadesuit$} \tilde v(\epsilon) = \sqrt{1 - \epsilon^2} \, v_1 + \epsilon v_2. \end{equation} The vector $\tilde v(\epsilon)$ really does have norm $1$.

We are assuming (for a contradiction) that $A v_2$ is not orthogonal to $A v_1$. This implies that $A v_2 = c A v_1 + w$, for some $c \neq 0$ and $w \perp A v_1$. It follows that \begin{align} \| A \tilde v(\epsilon) \|_2^2 &= \| (\sqrt{1 - \epsilon^2} + c \epsilon) A v_1 + \epsilon w \|^2 \\ &= (\underbrace{\sqrt{1 - \epsilon^2} + c \epsilon}_{>1 \text{ if }\epsilon \text{ is small enough}})^2 \| A v_1 \|^2 + \epsilon^2 \| w \|^2. \end{align} This shows that $v_1$ is not a true maximizer of $\| A v \|$ subject to the constraint $\| v\|_1$. We have arrived at a contradiction.

The point of the intuitive discussion was to explain how we might think of perturbing $v_1$ as in equation ($\spadesuit$), and why we would expect this perturbation of $v_1$ to be an improvement on $v_1$.

Understanding a derivation of the SVD

Tags:

Svd

Related

Recent Posts