My LSTM learns, loss decreases, but Numerical Gradients don't match Analytical Gradients

Solved it! in my check_grad, I need to build the caches which is served to df_analytical, but in so doing, I also overwrite the h and c which should have been np.zeroes.

y, outputs, loss, h, c, caches = f(params, h, c, inputs, targets)

_, _, loss_minus, _, _, _ = f(params, h, c, inputs, targets)
p.flat[pix] = old_val

So, simply not overwriting h and c fixes it, and the LSTM code was a.o.k.

_, outputs, loss, _, _, caches = f(params, h, c, inputs, targets)

I think the problem might be this line:

c = f_sigm * c_old + i_sigm * g_tanh