Simultaneous diagonalization of commuting linear transformations
This answer is basically the same as Paul Garrett's. --- First I'll state the question as follows.
Let $V$ be a finite dimensional vector space over a field $K$, and let $S$ and $T$ be diagonalizable endomorphisms of $V$. We say that $S$ and $T$ are simultaneously diagonalizable if (and only if) there is a basis of $V$ which diagonalizes both. The theorem is
$S$ and $T$ are simultaneously diagonalizable if and only if they commute.
If $S$ and $T$ are simultaneously diagonalizable, they clearly commute. For the converse, I'll just refer to Theorem 5.1 of The minimal polynomial and some applications by Keith Conrad.
EDIT. The key statement to prove the above theorem is Theorem 4.11 of Keith Conrad's text, which says:
Let $A: V \to V$ be a linear operator. Then $A$ is diagonalizable if and only if its minimal polynomial in $F[T]$ splits in $F[T]$ and has distinct roots.
[$F$ is the ground field, $T$ is an indeterminate, and $V$ is finite dimensional.]
The key point to prove Theorem 4.11 is to check the equality $$V=E_{\lambda_1}+···+E_{\lambda_r},$$ where the $\lambda_i$ are the distinct eigenvalues and the $E_{\lambda_i}$ are the corresponding eigenspaces. One can prove this by using Lagrange's interpolation formula: put $$f:=\sum_{i=1}^r\ \prod_{j\not=i}\ \frac{T-\lambda_j}{\lambda_i-\lambda_j}\ \in F[T]$$ and observe that $f(A)$ is the identity of $V$.
You've proven (from $ST=TS$) that the $\lambda$-eigenspace $V_\lambda$ of $T$ is $S$-stable. The diagonalizability of $S$ on the whole space is equivalent to its minimal polynomial having no repeated factors. Its minimal poly on $V_\lambda$ divides that on the whole space, so is still repeated-factor-free, so $S$ is diagonalizable on that subspace. This gives an induction to prove the existence of a simultaneous basis of eigenvectors. Note that it need not be the case that every eigenvector of $T$ is an eigenvector of $S$, because eigenspaces can be greater-than-one-dimensional.
Edit: Thanks Arturo M. Yes, over a not-necessarily algebraically closed field, one must say that "diagonalizable" is equivalent to having no repeated factor and splits into linear factors.
Edit 2: $V_\lambda$ being "S-stable" means that $SV_\lambda\subset V_\lambda$, that is, $Sv\in V_\lambda$ for all $v\in V_\lambda$.