How to interpret clearly the meaning of the units parameter in Keras?
You can (sort of) think of it exactly as you think of fully connected layers. Units are neurons.
The dimension of the output is the number of neurons, as with most of the well known layer types.
The difference is that in LSTMs, these neurons will not be completely independent of each other, they will intercommunicate due to the mathematical operations lying under the cover.
Before going further, it might be interesting to take a look at this very complete explanation about LSTMs, its inputs/outputs and the usage of stative = true/false: Understanding Keras LSTMs. Notice that your input shape should be input_shape=(look_back, 1)
. The input shape goes for (time_steps, features)
.
While this is a series of fully connected layers:
- hidden layer 1: 4 units
- hidden layer 2: 4 units
- output layer: 1 unit
This is a series of LSTM layers:
Where input_shape = (batch_size, arbitrary_steps, 3)
Each LSTM layer will keep reusing the same units/neurons over and over until all the arbitrary timesteps in the input are processed.
- The output will have shape:
(batch, arbitrary_steps, units)
ifreturn_sequences=True
.(batch, units)
ifreturn_sequences=False
.
- The memory states will have a size of
units
. - The inputs processed from the last step will have size of
units
.
To be really precise, there will be two groups of units, one working on the raw inputs, the other working on already processed inputs coming from the last step. Due to the internal structure, each group will have a number of parameters 4 times bigger than the number of units (this 4 is not related to the image, it's fixed).
Flow:
- Takes an input with n steps and 3 features
- Layer 1:
- For each time step in the inputs:
- Uses 4 units on the inputs to get a size 4 result
- Uses 4 recurrent units on the outputs of the previous step
- Outputs the last (
return_sequences=False
) or all (return_sequences = True
) steps- output features = 4
- For each time step in the inputs:
- Layer 2:
- Same as layer 1
- Layer 3:
- For each time step in the inputs:
- Uses 1 unit on the inputs to get a size 1 result
- Uses 1 unit on the outputs of the previous step
- Outputs the last (
return_sequences=False
) or all (return_sequences = True
) steps
- For each time step in the inputs:
The number of units is the size (length) of the internal vector states, h
and c
of the LSTM. That is no matter the shape of the input, it is upscaled (by a dense transformation) by the various kernels for the i
, f
, and o
gates. The details of how the resulting latent features are transformed into h
and c
are described in the linked post. In your example, the input shape of data
(batch_size, timesteps, input_dim)
will be transformed to
(batch_size, timesteps, 4)
if return_sequences
is true, otherwise only the last h
will be emmited making it (batch_size, 4)
. I would recommend using a much higher latent dimension, perhaps 128 or 256 for most problems.