Proof of Spectral Theorem

The main reason for posting this was to answer it, thus collecting all this stuff in a single place for future reference -- and present too.

The first item on this proof is that a linear operator on a finite-dimensional complex vector space admits an upper triangular representation. This is proved by induction on $n:=\dim V$, $V$ being the vector space. If it is 1D, the proof is trivial. Suppose $\dim V=n>1$ and the theorem holds for dimensions up to $n-1$. We know our operator $T$ has an eigenvalue. Indeed, consider $v,Tv,T^2v,T^3v,\dotsc,T^nv$. Those cannot be linearly independent if $v\neq0$, since they are $n+1$ and $\dim V=n$. So there exist $a_i\in\mathbb{C}$ such that: $$\sum_{i=1}^nT^iva_i=0.$$ Let $m$ be the largest index such that $a_m\neq0$. THis is not 0, since $v\neq0$. Factor the polynomial: $$a_0+a_1z+\dotso+a_mz^m=c(z-\lambda_1)\cdot\dotso\cdot(z-\lambda_m).$$ Substituting $T$ for $z$, and applying to $v$, we find: $$0=\left(\sum_{i=1}^ma_iT^i\right)v=c(T-\lambda_1I)\cdot\dotso\cdot(T-\lambda_mI)v,$$ so $T-\lambda_iI$ is not injective for some $i$. But this equates to $\lambda_i$ being an eigenvector, since not injective iff has nontrivial kernel iff $(T-\lambda_iI)v=0$ for some $v\neq0$ iff $\lambda_iv=Tv$ i.e. $\lambda_i$ is an eigenvector. So going back to our original $T$, consider any eigenvalue $\lambda$. $T-\lambda I$ is not injective, but by nullity+rank we have $T-\lambda I$ is not surjective. If $U=\mathrm{Im}(T-\lambda I)$ is the range of that operator, then $\dim U<\dim V$. Also, $U$ is invariant under $T$ since: $$Tu=(T-\lambda I)u+\lambda u,$$ and if $u\in U$ then both summands are in $U$. So $T|_U$ is an operator on $U$, and by induction there exists a basis of $U$ such that $T$ is represented by an upper triangular matrix w.r.t that basis. So if $k:=\dim U$ and that basis is $\{u_1,\dotsc,u_k$, then $Tu_j$ is in the span of $u_1,\dotsc,u_j$ for all $j\leq m$. Extend that basis to a basis of $V$ by adding extra vectors $v_1,\dotsc,v_{n-k}$. $Tv_i$ is in the span of $u_1,\dotsc,u_k$ for all $i\leq n-k$, thus in that of $u_1,\dotsc,u_k,v_1,\dotsc,v_i$. And this gives us upper triangularity of the matrix representing $T$ w.r.t. $u_1,\dotsc,u_k,v_1,\dotsc,v_{n-k}$, QED.

The rest of this answer is practically copied off this pdf. First of all, notice how $T$, a linear operator, is uniquely determined by the values of $\langle Tu,v\rangle$ for $u,v\in V$. That is because the inner product is positive definite, so if $S$ satisfies $\langle Tu,v\rangle=\langle Su,v\rangle$ for all $u,v\in V$, we first conclude $\langle(T-S)u,v\rangle=0$ for all $u,v\in V$, but fixing $u$ this means $(T-S)u=0$, and that holds for all $u$, hence $T-S=0$ or $T=S$. This makes it sensible to define an operator via: $$\langle Tu,v\rangle=\langle u,T^\ast v\rangle,$$ for all $u,v\in V$. $T^\ast$ is uniquely determined as seen above, and is called the adjoint of $T$ w.r.t this inner product. Elementary properties of the operation of taking the adjoint are that $(S+T)^\ast=S^\ast+T^\ast$, $(aS)^\ast=\bar aS^\ast$ for the complex case, the identity is self-adjoint (i.e. coincides with its adjoint), adjoining is an involution (i.e. $(T^\ast)^\ast=T$), $M(T^\ast)=M(T)^\ast$ in the complex case, denoting by $^\ast$ the conjugate transpose of a matrix, and $(ST)^\ast=T^\ast S^\ast$. The linked pdf also proves the eigenvalues of a self-adjoint operators are all real, but this is irrelevant here, so I will leave the proof to that pdf. We define normal operators as those for which $TT^\ast=T^\ast T$, i.e. those commuting with their adjoints. The polarization identity is another interesting result I leave to the pdf. One result we will use is that, when $\|v\|=\sqrt{\langle v,v\rangle}$, then $\|Tv\|=\|T^\ast v\|$ for any $v$ if $T$ is normal The proof is immediate: \begin{align*} T\text{ is normal}\iff{}&TT^\ast-T^\ast T=0\iff\langle(TT^\ast-T^\ast T)v,v\rangle=0\quad\forall v\in V\iff{} \\ {}\iff{}&\langle T^\ast Tv,v\rangle=\langle TT^\ast v,v\rangle\quad\forall v\in V\iff{} \\ {}\iff{}&\|T^\ast v\|^2=\langle T^\ast v,T^\ast v\rangle=\langle Tv,Tv\rangle=\|Tv\|^2. \end{align*}

As is subsequently proved, this implies that if $T$ is normal the kernel of $T,T^\ast$ coincide, the eigenvalues of $T^\ast$ and $T$ are mutually conjugate, and that distinct eigenvalues are associated to orthogonal eigenvectors, which is in fact true in general.

Now the big result: unitary diagonalizability equates to normality. This statement is of course equivalent to proving an operator $T$ is normal iff it admits an orthonormal eigenbasis, since any change of basis is unitary. So let us assume $T$ is normal. We know any operator can be represented by an upper triangular matrix w.r.t. some basis. We take that basis and show the corresponding matrix representation of $T$, $M(T)$, is in fact diagonal. This makes use of the Pythagorean theorem, proved here, and of the norm identity we proved a while ago relating the norm of an image via $T$ to that via $T^\ast$. By definition, if $M(T)=(a_{ij})_{i,1=1}^n$, we have $Te_i=a_{ii}e_i$,and since $M(T^\ast)=M(T)^\ast$ we also know $T^\ast e_i=\sum_i^n\bar a_{ik}e_k$. So by the Pythagorean theorem and the norm identity: $$|a_{ii}|^2=\|Te_i\|^2=\|T^\ast e_i\|^2=\sum_{k=i}^n\|a_{ik}|^2,$$ implying those for $k\neq i$ are all zero terms. The above holds for any $i$, proving $M(T)$ is diagonal.

Now suppose $M(T)$ is diagonalizable w.r.t. some orthonormal eigenbasis. $M(T^\ast)=M(T)^\ast$, so $T^\ast$ is also diagonalizable. Indeed, they are both diagonalizable w.r.t. the same basis, since the eigenvalues are mutually conjugate and the eigenvectors coincide. But we know $M(TT^\ast)=M(T)M(T^\ast)$, so: $$M(TT^\ast)=M(T)M(T^\ast)=M(T)M(T)^\ast=M(T)^\ast M(T)=M(T^\ast)M(T)=M(T^\ast T),$$ since diagonal matrices always commute. Thus, $T^\ast T=TT^\ast$, for if the matrix representations w.r.t. some basis coincide it means the two have the same images for any vector, and thus coincide. So if $T$ is diagonalizable, $T$ is normal.

Update

I just realised the proof implicitly uses the fact that if the quadratic form associated to an operator is zero then the operator is zero, i.e. $\langle Tv,v\rangle\,\,\forall v\in V\implies T=0$. This is proved here on p. 147:

$\quad$ (ii) Since $(T(x+y),x+y)=(Tx,x)+(Tx,y)+(Ty,x)+(Ty,y)$, $x,y\in V$, and $(Tv,v)=0$ for all $v\in V$, we have $$\tag{$*$} 0=(Tx,y)+(Ty,x).$$ If $V$ is an inner product space over $\mathbb{R}$, then from $(*)$: $$0=(Tx,y)+(Ty,x)=(Tx,y)+(y,Tx)=2(Tx,y).$$ Hence, $(Tx,y)=0$ for all $x,y\in V$, and $T\equiv 0$.

$\quad$ If $V$ is an inner product space over $\Bbb C$, then replacing $y$ by $iy$ in $(*)$, we have $(Tx,iy)+(iTy,x)=0$. Thus for all $x,y\in V$: $$(Tx,y)-(Ty,x)=0.$$ Hence, $(Tx,y)=0$ for all $x,y\in V$, and $T\equiv 0$.


The spectral theorem says that every normal operator$~\phi$ on a finite dimensional complex inner product space$~V$ is diagonalisable, and that its eigenspaces are mutually orthogonal. As a consequence an orthonormal basis of$~V$ consisting eigenvectors for$~\phi$ can be chosen.

Here is a simple proof. I will start with the special case where $\phi$ is an Hermitian (also called self-adjoint) operator, which is quite easy; more so indeed than the case of a self-adjoint-operator on a finite dimensional real inner product space (also called Euclidean space) that I discussed here. Then I will use this result to generalise it to the case of normal operators.

A basic fact about adjoints is that for any linear operator $\phi$ on$~V$, whenever a subspace $W$ is stable under$~\phi$, its orthogonal complement $W^\perp$ is stable under its adjoint$~\phi^*$. For if $v\in W^\perp$ and $w\in W$, then $\langle w\mid \phi^*(v)\rangle=\langle \phi(w)\mid v\rangle=0$ since $\phi(w)\in W$, so that $\phi^*(v)\in W^\perp$. Then for a Hermitian operator $\phi$ (so with $\phi^*=\phi$), the orthogonal complement of any $\phi$-stable subspace is again $\phi$-stable.

Now we prove the spectral theorem for Hermitian $\phi$ by induction on $\dim V$. When $\dim V=0$ the unique operator $\phi$ on $V$ is diagonalisable with empty set of eigenvalues, and the result is trivial. Now assuming $\dim V>0$, there is at least one eigenvalue (since the characteristic polynomial of $\phi$ has a root by the fundamental theorem of algebra) so we can choose an eigenvector $v_1$ of$~\phi$. The subspace $W=\langle v_1\rangle$ it spans is $\phi$-stable by the definition of an eigenvector, and so $W^\perp$ is $\phi$-stable as well. We can then restrict $\phi$ to a linear operator on $W^\perp$, which is clearly self-adjoint, so our induction hypothesis gives us an orthonormal basis of $W^\perp$ consisting of eigenvectors for that restriction; call them $(v_2,\ldots,v_n)$. Viewed as elements of $V$, the vectors $v_2,\ldots,v_n$ are eigenvectors of$~\phi$, and clearly the family $(v_1,\ldots,v_n)$ is orthonormal. It is an orthonormal basis of eigenvectors of$~\phi$, and we are done.


So now I will deduce from this the more general case where $\phi$ is a normal operator. First, note that the from Hermitian case easily follows the anti-Hermitian case, i.e., the one where $\phi^*=-\phi$. One way is to observe that for anti-Hermitian $~\phi$ one still has that if a subspace $W$ is $\phi$-stable then so is its orthogonal complement $W^\perp$ (just put a minus sign in the above argument), so the proof of the spectral theorem for Hermitian$~\phi$ can be reused word-for-word. Another way is to observe that $\phi$ is Hermitian if and only if $\def\ii{\mathbf i}\ii\phi$ is anti-Hermitian, so if one is diagonalisable so is the other, with the same eigenspaces but with the eigenvalues of $\ii\phi$ multiplied by$~\ii$ with respect to those of$~\phi$. (One easily shows the eigenvalues are real in the Hermitian case and imaginary in the anti-Hermitian case, but that is of no importance here.)

Next observe that any linear operator $\phi$ can be written as the sum of a Hermitian operator $\frac12(\phi+\phi^*)$ and an anti-Hermitian operator $\frac12(\phi-\phi^*)$, which are called its Hermitian and anti-Hermitian parts. Moreover $\phi$ and $\phi^*$ commute (i.e., $\phi$ is a normal operator) if and only if the Hermitian and anti-Hermitian parts of $\phi$ commute.

So for a normal operator$~\phi$ we know that its Hermitian and anti-Hermitian parts are both diagonalisable with mutually orthogonal eigenspaces, and the two parts commute. Now apply the well known fact that two commuting diagonalisable operators can be simultaneously diagonalised; any basis of common eigenvectors is a basis of diagonalisation for$~\phi$. Finally, if one has two eigenvectors of$~\phi$ for distinct eigenvalues, then they are also eigenvectors for the Hermitian and anti-Hermitian parts of$~\phi$, and for at least one of the two they are so for distinct eigenvalues; by virtue of that fact the vectors are orthogonal. This shows that eigenspaces of$~\phi$ are mutually orthogonal.


In fact, it remains true for a normal operator$~\phi$ that the orthogonal complement of a $\phi$-stable subspace is always $\phi$-stable, so one could have reused the proof of the Hermitian case once again. However to prove this fact would seem to be more work than the separate argument I gave.