What's the difference between "hidden" and "output" in PyTorch LSTM?
I made a diagram. The names follow the PyTorch docs, although I renamed num_layers
to w
.
output
comprises all the hidden states in the last layer ("last" depth-wise, not time-wise). (h_n, c_n)
comprises the hidden states after the last timestep, t = n, so you could potentially feed them into another LSTM.
The batch dimension is not included.
It really depends on a model you use and how you will interpret the model. Output may be:
- a single LSTM cell hidden state
- several LSTM cell hidden states
- all the hidden states outputs
Output, is almost never interpreted directly. If the input is encoded there should be a softmax layer to decode the results.
Note: In language modeling hidden states are used to define the probability of the next word, p(wt+1|w1,...,wt) =softmax(Wht+b).