How to characterize self-adjoint operators in terms of orthogonal diagonalizability
In terms of physics, there's a simple reason why you rule out normal operators: physical quantities are things that you can measure. And therefore the corresponding eigenvalues should be real. Normal operators in general admits complex eigenvalues.
If the self-adjoint operator is compact, then you know what the eigenfunctions are (the orthonormal basis you get from the spectral theorem; Kato may have meant his self-adjoint operators to be compact, but I doubt it). In the more general cases, what Kato (I assume) was thinking of is perhaps more along the line of "generalized" eigenfunctions. Two examples:
- On $L^2(\mathbb{R})$, the Laplacian is an unbounded self-adjoint operator (or rather, has a self-adjoint extension yada yada). From solving the ODE, you see that $e^{ikx}$ satisfy $\triangle e^{ikx} = -k^2 e^{ikx}$, so they look loke eigenfunctions, but of course, $e^{ikx}$ is not in $L^2(\mathbb{R})$.
- On $L^2([0,1])$, the operation $f(x) \mapsto x f(x)$ is bounded and self-adjoint. (But it is not compact.) It is easy to see by inspection that the Dirac distribution $\delta_{x_0}$ "solves" the eigenfunction equation with eigenvalue $x_0$, but of course the delta function is not an element of $L^2$.
In fact, in terms of the measure formulation, the eigenfunctions are precisely objects supported on a point $\lambda$. So if you apply the Lebesgue decomposition theorem, you see that for every $\lambda$ that is in the pure-point part of the measure $P_A$, the characteristic function of $\lambda$ is measurable, and its integral corresponds to a projection onto some subspace of your Hilbert space. Elements of those subspaces are eigenfunctions.
In any case, whenever you see sweeping statements like this made in books or articles, you should always take them with a grain of salt and treat them more like guiding principles rather than precise definitions.