To prove Cayley-Hamilton theorem, why can't we substitute $A$ for $\lambda$ in $p(\lambda) = \det(\lambda I - A)$?
There is another way to see that the proof must be flawed: by finding the interesting consequences this proof technique has. If the proof would be valid, then we would also have the following generalisation:
Faulty Lemma. Suppose that $A$ and $B$ are $n\times n$ matrices. Let $p_A$ be the characteristic polynomial for $A$. If $B - A$ is singular, then $B$ must be a zero of $p_A$.
Faulty proof: We have $p_A(B) = \det(BI - A) = \det(B - A) = 0$.$$\tag*{$\Box$}$$
This has the following amazing consequence:
Faulty Corollary. Every singular matrix is nilpotent.
Faulty proof: Let $B$ be a singular matrix and let $A$ be the zero matrix. Now we have $p_A(\lambda) = \lambda^n$. Furthermore, by the above we have $p_A(B) = 0$, because $B - A$ is singular. Thus, we have $B^n = 0$ and we see that $B$ is nilpotent.$$\tag*{$\Box$}$$
In particular, this proves that we have $$ \pmatrix{1 & 0 \\ 0 & 0}^2 = \pmatrix{0 & 0 \\ 0 & 0}. $$ This comes to show just how wrong the proof is!
If $$A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$$ then $p(\lambda)$ is the determinant of the matrix $$\lambda I - A = \begin{bmatrix} \lambda - 1 & -2 \\ -3 & \lambda - 4 \end{bmatrix}.$$ Now I plug in $A$ for $\lambda$ and get $$\begin{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} - 1 & -2 \\ -3 & \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} - 4 \end{bmatrix}$$ but I don't know what that is, and I certainly don't know how to take its determinant.
So the reason you can't plug into $\det(\lambda I - A)$ is because that expression only makes sense when $\lambda$ is a scalar. The definition of $p(\lambda)$ isn't really $\det(\lambda I - A)$, the definition of $p(\lambda)$ is that it's the polynomial whose value on any scalar equals the value of $\det(\lambda I - A)$.
On the other hand I could define a function $P(\lambda) = \det(\lambda - A)$ where I'm now allowed to plug in matrices of the same size as $A$, and I certainly would get zero if I plugged in $A$. But this is a function from matrices to numbers, whereas when I plug matrices into $p(\lambda)$ I get as output matrices. So it doesn't make sense to say that these are equal, so the fact that $P(A) = 0$ wouldn't seem to imply that $p(A) = 0$ sense $P$ and $p$ aren't the same thing.
Remember that there is a difference between $p(x)$ where $x$ is scalar and $p(A)$ where $A$ is a matrix, the next thing you should notice is that if your deduction is true, then $p(A)=0$, the left hand side of this equation is a matrix, while the right hand side is the scalar $0$.
What Cayley-Hamilton theorem says is that $A$ satisfies its own characteristic polynomial. If you have worked with minimal polynomials before, the proof of this statement is a simple task (given all of the previous work, obviously).