What's the difference between "hidden" and "output" in PyTorch LSTM?

I made a diagram. The names follow the PyTorch docs, although I renamed num_layers to w.

output comprises all the hidden states in the last layer ("last" depth-wise, not time-wise). (h_n, c_n) comprises the hidden states after the last timestep, t = n, so you could potentially feed them into another LSTM.

LSTM diagram

The batch dimension is not included.

It really depends on a model you use and how you will interpret the model. Output may be:

a single LSTM cell hidden state
several LSTM cell hidden states
all the hidden states outputs

Output, is almost never interpreted directly. If the input is encoded there should be a softmax layer to decode the results.

Note: In language modeling hidden states are used to define the probability of the next word, p(w_t+1|w₁,...,w_t) =softmax(Wh_t+b).

What's the difference between "hidden" and "output" in PyTorch LSTM?

Tags:

Deep Learning

Tensor

Lstm

Recurrent Neural Network

Pytorch

Related

Recent Posts