My LSTM learns, loss decreases, but Numerical Gradients don't match Analytical Gradients
Solved it! in my check_grad
, I need to build the caches
which is served to df_analytical
, but in so doing, I also overwrite the h
and c
which should have been np.zeroes
.
y, outputs, loss, h, c, caches = f(params, h, c, inputs, targets)
_, _, loss_minus, _, _, _ = f(params, h, c, inputs, targets)
p.flat[pix] = old_val
So, simply not overwriting h
and c
fixes it, and the LSTM code was a.o.k.
_, outputs, loss, _, _, caches = f(params, h, c, inputs, targets)
I think the problem might be this line:
c = f_sigm * c_old + i_sigm * g_tanh