How do anomalies affect the field equations of motion?

Short answer: The operator $\hat j$ is meaningless as written, because it contains divergences. In order to manipulate this operator (and take its divergence) one must introduce a regulator which, when removed, happens to leave a finite piece: the anomaly.

The usual narrative is that $\hat j$, defined as $$ \hat j^\mu(x)\overset?=\bar\psi(x)\gamma^\mu\psi(x)\tag1 $$ is very much ill-defined, inasmuch as it contains operators at coinciding spacetime points. (As in the OP, $\psi,\bar\psi$ are operator-valued distributions with coinciding singular support, so their product is undefined, just like $\delta(x)^2$; see also this PSE post). Therefore, any manipulation is unjustified, and it may or may not lead to the "correct" answer.

One way to fix this is to require the equations of motion to be whatever the path integral predicts. This a consistent prescription, and it has the advantage of being manifestly covariant and non-perturbative. This is the point of view held by, for example, DeWitt in his QFT book (and it seems to me this is how physicists think nowadays).

A more down-to-earth approach is to introduce some regulator to your operator $\hat j$, perform all your manipulations using well-defined objects only, and take the physical limit at the very end. In the case above, one lets $$ \hat j^\mu(x):=\lim_{y\to x}\bar\psi(x) W_\gamma(x,y)\gamma^\mu\psi(y)\tag2 $$ where $W$ is a Wilson line with $\partial\gamma=x-y$ (to make $\hat j$ gauge invariant). Proceeding as in the OP, one derives (cf. this PSE post) $$ \mathrm d\hat j=e(F)\tag3 $$ the Euler class of the gauge field, as is well-known. This is a non-perturbative effect, although (being topological, cf. this PSE post) can be detected by a one-loop computation.

More generally, the "equations of motion" of a quantum field are ill-defined, and any formal manipulation is unjustified from a rigorous point of view. Luckily for us, some manipulations turn out to give the "correct" answer anyway (otherwise people would have ditched QFT in the 30s, before understanding how it really works!). A more conservative philosophy is to work with well-defined objects at all times or, even better, to work with distributions as they are supposed to be used (in the so-called causal approach).

Finally, it bears mentioning that the modern approach to anomalies is through topological arguments. A nice review is given by e.g. the first few sections of 1808.00009. Cohomology, bordism groups, and all that.

The devil is in the details. When you say that

...modulo formal issues involving multiplying operator-valued distributions

right after you wrote the equations of motion you ignore a very important point. The right-hand side of the equation

$$ (\partial^2 + m^2) \hat{\phi} = \hat{\phi}^3 $$

is not well defined if you does not regularize and renormalize it. Turns out that in other to do it exactly you need in advance the complete (quantum) solution of the model. One way to circumvent this problem is by organize your computations by perturbation theory of the non-linearity. Then at each order in your computation, for a given regularization, you are going to need to introduce a counter-term to match some renormalization conditions (and avoid nonsensical divergences). The anomaly are going to appear by the fact that there is no regularization, and no finite conter-term, able to preserve the anomalous symmetry. This imply that any attempt to make sense of the equation of motion you wrote are going to break the classical symmetry.

This conclusion does not depends on the perturbation theory. You may try to make sense of this equation by putting it on the lattice and use computers to do non-perturbative computations. By putting the theory on the lattice you are regularizing the theory. The anomaly will appear as an impossibility to put this theory on the computer without breaking (explicitly) the symmetry.

A more abstract way to see what goes wrong is to work with renormalized operators. In free theory, operators like $\phi^3$ need to be normal ordered (renormalized). Once you introduce an interaction, the normal ordering are going to be modified, and you can calculate this modifications perturbativally. This calculation is obtained by looking at $\phi^3$ as a vertex of Feynman diagrams. The normal ordering of free theory is the subtraction of self-contraction of the legs that come out of the vertex. In the interacted case there will be also collision of this legs, with new vertices appearing on this collisions, and new subtractions are going to be required.

About the last equation, this is an identity between operators in the Heisenberg picture. Again, classically you expect that the left-hand side is identically zero as an operator equation. The fact that the current operator suffers from ordering problems in the case of chiral fermions, you are going to get a correction. What is interesting is that this correction is exact at one-loop computation. Usually equations like the first one you wrote will continue to get corrections indefinitely. You can think about this equations (already with the proper corrections) as equation of motions coming from varying the 1PI quantum effective action.

So, in your example of the chiral anomally, the 1PI quantum effective action at one-loop will have a term that explicitly breaks the chiral anomally and contribute with the equations of motion.