Multi dimensional input for LSTM in Keras
Keras creates a computational graph that executes the sequence in your bottom picture per feature (but for all units). That means the state value C is always a scalar, one per unit. It does not process features at once, it processes units at once, and features separately.
import keras.models as kem
import keras.layers as kel
model = kem.Sequential()
lstm = kel.LSTM(units, input_shape=(timesteps, features))
model.add(lstm)
model.summary()
free_params = (4 * features * units) + (4 * units * units) + (4 * num_units)
print('free_params ', free_params)
print('kernel_c', lstm.kernel_c.shape)
print('bias_c', lstm.bias_c .shape)
where 4
represents one for each of the f, i, c, and o internal paths in your bottom picture. The first term is the number of weights for the kernel, the second term for the recurrent kernel, and the last one for the bias, if applied. For
units = 1
timesteps = 1
features = 1
we see
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 1) 12
=================================================================
Total params: 12.0
Trainable params: 12
Non-trainable params: 0.0
_________________________________________________________________
num_params 12
kernel_c (1, 1)
bias_c (1,)
and for
units = 1
timesteps = 1
features = 2
we see
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 1) 16
=================================================================
Total params: 16.0
Trainable params: 16
Non-trainable params: 0.0
_________________________________________________________________
num_params 16
kernel_c (2, 1)
bias_c (1,)
where bias_c
is a proxy for the output shape of the state C. Note that there are different implementations regarding the internal making of the unit. Details are here (http://deeplearning.net/tutorial/lstm.html) and the default implementation uses Eq.7. Hope this helps.