Why are real symmetric matrices diagonalizable?
Suppose the ground field is $\mathbb C$. It is immediate then that every square matrix can be triangulated. Now, symmetry certainly implies normality ($A$ is normal if $AA^t=A^tA$ in the real case, and $AA^*=A^*A$ in the complex case). Since normality is preserved by similarity, it follows that if $A$ is symmetric, then the triangular matrix $A$ is similar to is normal. But obviously (compute!) the only normal triangular matrix is diagonal, so in fact $A$ is diagonalizable.
So it turns out that the criterion you mentioned for diagonalizability is not the most useful in this case. The one that is useful here is: A matrix is diagonalizable iff it is similar to a diagonal matrix.
Of course, the result shows that every normal matrix is diagonalizable. Of course, symmetric matrices are much more special than just being normal, and indeed the argument above does not prove the stronger result that symmetric matrices are orthogonaly diagonalizable.
Comment: To triangulate the matrix, use induction of the order of the matrix. For $1\times 1$ it's trivial. For $n\times n$, first find any arbitrary eigenvector $v_1$ (one such must exist). Thinking of the matrix as a linear transformation on a vector space $V$ of dimension $n$, write $V$ as $V=V_1\oplus W$, where $V_1$ is the subspace spanned by $v_1$. Then $W$ is $n-1$-dimensional, apply the induction hypothesis to $A|_{W}$ to obtain a base $v_2,\ldots, v_n$ in which $A|_W$ is triangular. It now follows that in the base $v_1,\ldots, v_n$ $A$ is triangular.
Here's some intuition (but not a rigorous proof).
If $A$ is hermitian (with entries in $\mathbb C$), you can easily show that the eigenvalues of $A$ are real and that eigenvectors corresponding to distinct eigenvalues are orthogonal.
Typically, all the eigenvalues of $A$ are distinct. (It is in some sense a huge coincidence if two eigenvalues turn out to be equal.) So, typically $A$ has an orthonormal basis of eigenvectors.
Even if $A$ has some repeated eigenvalues, perturbing $A$ slightly will probably cause the eigenvalues to become distinct, in which case there is an orthonormal basis of eigenvectors. By thinking of $A$ as a limit of slight perturbations of $A$, each of which has an ON basis of eigenvectors, it seems plausible that $A$ also has an ON basis of eigenvectors.
This question is about the spectral theorem for (finite dimensional) real Euclidean spaces, which says that in such a space any self-adjoint operator is diagonalisable (over the real numbers) with mutually orthogonal eigenspaces (so that orthonormal bases of eigenvectors exist). This is of course a classic result that should be proved in any course on the subject, so you can look this up in any textbook. However there are quite a few questions on this site that more or less touch on this matter, yet no really satisfactory answers, so I thought this is as good a place as any for me to state what seems a good proof to me (the one I teach in my course).
Before starting, I would like to note this is a subtle result, as witnessed by the fact that it becomes false if one replaces the real numbers either by the rational numbers or by the complex numbers (the latter case can be salvaged by throwing in some complex conjugation, but I want to avoid the suggestion that the validity of the stated result somehow depends on that). Nonetheless there are a number of relevant considerations that are independent of the base field, which easy stuff I will state as preliminaries before we get to the meat of the subject.
First off, the matrix formulation in the question is just a restatement, in terms of the matrix of the operator with respect to any orthonormal basis, of the result I mentioned: under such expression the adjoint operator gets the transpose matrix, so a self-adjoint operator gets represented by a symmetric matrix. Since the basis used for such an expression bears no particular relation to our operator or the problem at hand (trying to find a basis of eigenvectors) it will be easier to ignore the matrix and reason directly in terms of the operator. The translation by expression in terms of matrices is easy and I will leave this aside; for reference, the claim I am proving translates to the claim that for every real symmetric matrix$~A$, there is an orthogonal matrix $P$ (whose columns describe an orthonormal basis of eigenvectors) such that $P^{-1}AP=P^tAP$ is a diagonal matrix. It is worth noting that the converse is true and obvious: if $D$ is diagonal and $P$ orthogonal, then $A=PDP^t$ is symmetric (since $D^t=D$).
A basic fact about adjoints is that for any operator $\phi$ on a Euclidean vector space$~V$, whenever a subspace $W$ is stable under$~\phi$, its orthogonal complement $W^\perp$ is stable under its adjoint$~\phi^*$. For if $v\in W^\perp$ and $w\in W$, then $\langle w\mid \phi^*(v)\rangle=\langle \phi(w)\mid v\rangle=0$ since $\phi(w)\in W$ and $v\in W^\perp$, so that $\phi^*(v)\in W^\perp$. Then for a self-adjoint operator $\phi$ (so with $\phi^*=\phi$), the orthogonal complement of any $\phi$-stable subspace is again $\phi$-stable.
Now our focus will be on proving the following fact.
Lemma. Any self-adjoint operator $\phi$ on a real Euclidean vector space$~V$ of finite nonzero dimension has an eigenvector.
Assuming this for the moment, one easily proves our result by induction on the dimension. In dimension$~0$ the unique operator is diagonalisable, so the base case it trivial. Now assuming $\dim V>0$, get an eigenvector $v_1$ by applying the lemma. The subspace $W=\langle v_1\rangle$ it spans is $\phi$-stable by the definition of an eigenvector, and so $W^\perp$ is $\phi$-stable as well. We can then restrict $\phi$ to a linear operator on $W^\perp$, which is clearly self-adjoint, so our induction hypothesis gives us an orthonormal basis of $W^\perp$ consisting of eigenvectors for that restriction; call them $(v_2,\ldots,v_n)$. Viewed as elements of $V$, the vectors $v_2,\ldots,v_n$ are eigenvectors of$~\phi$, and clearly the family $(v_1,\ldots,v_n)$ is orthonormal. It is an orthonormal basis of eigenvectors of$~\phi$, and we are done.
So that was the easy stuff, the harder stuff is proving the lemma. As said, we need to use that the base field is the real numbers. I give two proofs, one in purely algebraic style, but which is based on the fundamental theorem of algebra and therefore uses the complex numbers, albeit in an indirect way, while the other uses a bit of topology and differential calculus but avoids complex numbers.
My first proof of the lemma is based on the fact that the irreducible polynomials in $\def\R{\Bbb R}\R[X]$ all have degree at most $2$. This comes from decomposing the polynomial$~P$ into a leading coefficient and monic linear factors over the complex numbers (as one can by the fundamental theorem of algebra); any monic factor not already in $\R[X]$ must be $X-z$ with $z$ a non-real complex number, but then $X-\overline z$ is relatively prime to it and also a divisor of$~P$, as is their product $(X-z)(Z-\overline z)=X^2-2\Re z+|z|^2$, which lies in $\R[X]$.
Given this we first establish (without using self-adjointness) in the context of the lemma the existence of a $\phi$-stable subspace of nonzero dimension at most$~2$. Start taking any monic polynomial $P\in\R[X]$ annihilating$~\phi$; the minimal or characteristic polynomial will do, but we really only need the existence of such a polynomial which is easy. Factor $P$ into irreducibles in $\R[X]$, say $P=P_1P_2\ldots P_k$. Since $0=P[\phi]=P_1[\phi]\circ\cdots\circ P_k[\phi]$, at least one of the $P_i[\phi]$ has nonzero kernel (indeed the sum of the dimensions of their kernels is at least $\dim V>0$); choose such an$~i$. If $\deg P_i=1$ then the kernel is a nonzero eigenspace, and any eigenvector in it spans a $\phi$-stable subspace of dimension$~1$ (and of course we also directly get the conclusion of the lemma here). So we are left with the case $\deg P_i=2$, in which case for any nonzero vector $v$ of its kernel, $v$ and $\phi(v)$ span a $\phi$-stable subspace of dimension$~2$ (the point is that $\phi^2(v)$ lies in the subspace because $P_i[\phi](v)=0$).
Now that we have a $\phi$-stable subspace$~W$ of nonzero dimension at most$~2$, we may restrict $\phi$ to $W$ and search an eigenvector there, which will suffice. But this means it suffices to prove the lemma with the additional hypothesis $\dim V\leq2$. Since the case $\dim V=1$ is trivial, that can be done by showing that the characteristic polynomial of any symmetric real $2\times2$ matrix has a real root. But that can be shown by showing that its discriminant is non-negative, which is left as an easy exercise (it is a sum of squares).
(Note that the final argument shows that for $\phi$ self-adjoint, the case $\deg P_i=2$ actually never occurs.)
Here is a second proof of lemma, less in the algebraic style I used so far, and less self-contained, but which I find more intuitive. It also suggests a practical method to find an eigenvector in the context of the lemma. Consider the real function$~f:V\to\R$ defined by $f:x\mapsto \langle x\mid \phi(x)\rangle$. It is a quadratic function, therefore in particular differentiable and continuous. For its gradient at $p\in V$ one computes for $v\in V$ and $h\in\R$: $$ f(p+hv)=\langle p+hv\mid \phi(p+hv)\rangle =\phi(p)+h\bigl(\langle p\mid \phi(v)\rangle +\langle v\mid \phi(p)\rangle\bigr) +h^2\phi(v) \\=\phi(p)+2h\langle v\mid \phi(p)\rangle+h^2\phi(v) $$ (the latter equality by self-adjointness), so that gradient is $2\phi(p)$. Now the point I will not prove is that the restriction of $f$ to the unit sphere $S=\{\,v\in V\mid\langle v\mid v\rangle=1\,\}$ attains its maximum somewhere on that sphere. There are many ways in analysis of showing that this is true, where the essential points are that $f$ is continuous and that $S$ is compact. Now if $p\in S$ is such a maximum, then every tangent vector to$~S$ at$~p$ must be orthogonal to the gradient $2\phi(p)$ of $f$ at$~p$ (or else one could increase the value of $f(x)$ near $p$ by varying $x$ along $S$ in the direction of that tangent vector). The tangent space of$~S$ at $p$ is $p^\perp$, so this statement means that $\phi(p)\in(p^\perp)^\perp$. But we know that $(p^\perp)^\perp=\langle p\rangle$, and $\phi(p)\in\langle p\rangle$ (with $p\neq 0$) means precisely that $p$ is an eigenvector of$~\phi$. (One may check it is one for the maximal eigenvalue of$~\phi$.)