How to understand $E=mc^2$?

Your question is a good one. It springs from a slightly incorrect understanding of the logic of the Einstein box argument. It is not necessary to ascribe mass to a photon in order to present the argument, and indeed that would be an incorrect way to proceed. Rather, one asserts that a pulse of electromagnetic radiation carries both energy and momentum, and the relationship between the energy and momentum of such a pulse is $$ E = p c. $$ This formula comes from classical electromagnetism (Maxwell equations etc.), not directly from relativity. (In a modern argument one would assert relativity first and then derive electromagnetism, but I won't get into that.) The rest of the argument is based on conservation laws.

Suppose the box has length $L$ and starts off centred at the origin, with its ends at $\pm L/2$. First, when the pulse is emitted by one end of the box, that end recoils with momentum $p$ and also gives up energy $E$. For example it could be thermal energy. In Newtonian physics this need have no effect on the mass of the wall of the box, but if one assumes that then one ends up in a contradiction. So let's assume instead that when the wall of the box gives up energy $E$, its mass falls a little, by an amount $m$ to be discovered. Then the recoil velocity of the wall is $$ v = \frac{p}{M-m} = \frac{E}{c(M-m)} $$ The pulse of light now propagates to the other end of the box, through a distance $L$, taking time $$ t = L/c . $$ When the other end of the box receives the energy and momentum of the pulse, its energy goes up by $E$ so its mass goes up by $m$ (the quantity we are trying to calculate). So now this end of the box is located at $L/2$ and has mass $M+m$, while the other end has moved a bit, to $-(L/2) - v t$, so the centre of mass of the whole box is now located at $$ x_{\rm cm} = (M-m) \left(-\frac{L}{2} - v t\right) + (M+m)\frac{L}{2}\\ = (M-m) \left(-\frac{L}{2} - \frac{E L}{(M-m)c^2}\right) + (M+m)\frac{L}{2}\\ = m L - \frac{EL}{c^2} . $$ But internal changes cannot shift the centre of mass, so we must have $x_{\rm cm} = 0$ and therefore $E = m c^2$.

The above is directly based on the discussion in a book called "The wonderful world of relativity" by myself (publisher Oxford University Press).

If we now look back over the derivation, we see that the mass $m$ is not associated with the pulse of light (or photon if you prefer). Rather, $m$ is the change in the mass of the wall of the box. You are right to quote the formula $$ E^2 = p^2 c^2 + M^2 c^4 $$ This formula applies equally well to bodies with zero rest mass (such as photons) as to bodies with non-zero rest mass (such as molecules). In the above argument when I said the wall gets a velocity $p/(M-m)$ I was in fact neglecting some small corrections which are negligible in the limit where this velocity is small compared to $c$. If one keeps those small corrections one still gets the answer $E = M c^2$ for a body with zero momentum.