Intuition on spectral theorem
Regarding the adjoint, suppose you have vectors spaces $X$ and $Y$ (over the same field), and a linear map $$ T:X\to Y $$ Write $X^*$ and $Y^*$ for the dual spaces. Then $T$ naturally induces a map $$ T^*:Y^* \to X^* $$ defined by $$ T^*(\phi):=\phi\circ T $$ This makes sense, because if $\phi$ is a linear functional on $Y$, then $\phi\circ T$ is a linear functional of $X$. Moreover, the function $T^*$ is also a linear transformation. This $T^*$ is called the adjoint of $T$ (there is a slight abuse of notation/terminology here, I'll elaborate on this in a moment). This is an example of what is called functorial behaviour. Taking adjoints is an example of what is called a contravariant functor.
Now, suppose that $X$ and $Y$ are finite-dimensional inner product spaces. Then you know that $X$ and $X^*$ can be canonically identified with each other. On the one hand, any $x\in X$ gives rise to a linear functional $\phi_x\in X^*$ defined by $$ \phi_x(v):=\langle v,x\rangle $$ Write $S_X:X\to X^*$ for the map that sends $x$ to $\phi_x$. It is easy to verify that $S_X$ is conjugate linear, i.e. $S_X(x+x')=S_X(x)+S_X(x')$ and $S_X(\alpha x)=\bar \alpha S_X(x)$.
On the other hand, given any $\phi\in X^*$, one can show that there exists (a unique) vector $x_\phi\in X$ such that, for every $v\in X$, $$ \phi(v)=\langle v, x_\phi\rangle $$ This shows that the function $S_X$ above is invertible, so it is "almost" an isomorphism, except for the fact that it is not strictly linear, but conjugate linear.
Now, the same thing can be done with $Y$, and we obtain a conjugate isomorphism $S_Y:Y\to Y^*$.
Consider now the composition $$ Y\overset{S_Y}{\longrightarrow} Y^*\overset{T^*}{\longrightarrow} X^* \overset{S^{-1}_X}{\longrightarrow} X $$ Call this composition $\hat T$, i.e. $\hat T(y)=(S^{-1}_X\circ T^*\circ S_Y)(y)$. You can check that $\hat T$ is linear.
Fix $x\in X$ and $y\in Y$. Put $\phi=(T^*\circ S_Y) y\in X^*$. Now, $S_X^ {-1}\phi$ is, by definition, the unique vector $z\in X$ such that $\langle v,z\rangle =\phi (v)$ for every $v\in X$. Therefore, $$ \langle x,\hat Ty\rangle =\langle x,S^{-1}_X\phi\rangle=\phi(x) $$ Now, $\phi=T^*(S_Yy)=(S_Yy)\circ T$. So, $$ \phi(x)=(S_Yy)(Tx) $$ Now, $S_Yy\in Y^*$ is the linear functional which right multiplies a vector in $Y$ by $y$. This means that $$ (S_Yy)(Tx)=\langle Tx,y\rangle $$ Putting everything together, we get that $$ \langle x,\hat Ty\rangle =\langle Tx,y\rangle $$ So, $\hat T$ has the property that "the adjoint" has in every linear algebra text. In practice, we use $T^*$ to refer to the above $\hat T$, and the original $T^*$ is left behind. I will be following this convention from now on, i.e. all $T^*$ in what follows really means $\hat T$. I should mention that having an inner product is key for all of this. For general vector spaces $X$ need not be isomorphic to $ X^*$.
Regarding your question about looking at normality, recall that, given a linear operators $T:X\to X$, a subspace $W\subset X$ is said to be $T$-invariant if $$ x\in W\implies Tx\in W $$ Define the orthogonal complement $$ W^\perp:=\{x\in X: \forall w\in W\langle x,y\rangle =0\} $$ Note that, if $W$ is $T$-invariant, then $W^\perp$ is $T^*$-invariant. Indeed, fix $x\in W^\perp$. We need to see that $T^*x\in W^\perp$. Let $w\in W$, then $$ \langle T^*x,w\rangle=\langle x,Tw\rangle=0 $$ because $x\in W^\perp$ and $Tw\in W$ (because $W$ is $T$-invariant). Since $w\in W$ was arbitrary, $T^*x\in W^\perp$.
If $T$ is, for example, self-adjoint, then we obviously have that a $W^\perp$ is $T$-invariant. This leads to the following question: can we find an easy property for an operator $T$ so that it satisfies that every $T$-invariant subspace has a $T$-invariant orthogonal complement? The answer to this question is yes, and the property is normality, see here.
How does this relate to being diagonalizable? Well, since the matrix of $T^*$ in the basis $B$ is the conjugate transpose of the matrix of $T$ in the basis $T$, it follows that any diagonalizable operator is necessarily normal.
Suppose now that $T$ is normal. Pick an eigenvalue $\lambda$ of $T$. Let $E$ be the associated eigenspace. Clearly, $E$ is $T$-invariant. Write $$ X=E\oplus E^\perp $$ By normality, $E^\perp$ is also $T$-invarint. This means that we can consider the restricted operator $T|_{E^\perp}:E^\perp \to E^\perp$. This new operator is also normal. But $\dim (E^\perp)<\dim X$, and we can carry out an inductive argument.
Almost everything about this subject was derived in the opposite order of what you have been taught. That's why it is difficult to answer your question.
The infinite-dimensional case was studied for functions before the finite-dimensional case, and well before the notion of a vector space.
Orthogonality was noticed and defined using integral conditions about 150 years before an inner product was defined, and before finite-dimensional Linear Algebra. These observations led to the notion of a general inner product space.
Linearity came out of the physical condition of superposition of solutions for the Heat Equation and vibrating string problem, not the other way around.
Self-adjoint was defined before there was an inner-product, through Lagrange's adjoint equation, which gave, among other things, a reduction of order tool for ODEs, and a notion of "integral orthogonality."
It's all upside down from the point-of-view of abstraction. Asking how you might start at the lowest level of abstraction and naturally move toward the more abstract direction is asking how to motivate the backwards direction from the Historical forward direction that brought us to this point. It wasn't derive that way, and might never have been.
To give a bit of a shorter answer, in the hermitian case observe that, if both $x$ and $y$ are both eigenvectors of $A$, corresponding to the eigenvalues $\lambda$ and $\mu$, then:
$$\begin{aligned} &\langle Ax, y \rangle = \langle \lambda x, y \rangle = \lambda \langle x, y \rangle \\ &\quad= \\ &\langle x, A^*y \rangle = \langle x, A y \rangle =\langle x, \mu y \rangle = \overline\mu \langle x, y \rangle \end{aligned}$$
Hence, $(\lambda -\overline\mu) \langle x, y \rangle =0$ implying either $\lambda=\overline\mu$ or $x\perp y$. Choosing $x=y$ we find that $\lambda=\overline\lambda$, so all eigenvalues must be real. Consequently, the eigenspaces corresponding to different eigenvalues are orthogonal to each other.
From this observation alone, lots of consequences follow quite naturally. One can easily prove that in this case a full orthogonal basis exists (see e.g. this writeup or try for yourself); likewise, if an orthonormal eigenbasis corresponding to real eigenvalues exists one can easily prove that $A$ must be hermitian.
The normal case is a bit more tricky, but one can play a similar game (may expand later).