How to build a attention model with keras?
Attention layers are part of Keras API of Tensorflow(2.1) now. But it outputs the same sized tensor as your "query" tensor.
This is how to use Luong-style attention:
query_attention = tf.keras.layers.Attention()([query, value])
And Bahdanau-style attention :
query_attention = tf.keras.layers.AdditiveAttention()([query, value])
The adapted version:
attention_weights = tf.keras.layers.Attention()([lstm, state_h])
Check out the original website for more information: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention https://www.tensorflow.org/api_docs/python/tf/keras/layers/AdditiveAttention
To answer Arman's specific query - these libraries use post-2018 semantics of queries, values and keys. To map the semantics back to Bahdanau or Luong's paper, you can consider the 'query' to be the last decoder hidden state. The 'values' will be the set of the encoder outputs - all the hidden states of the encoder. The 'query' 'attends' to all the 'values'.
Whichever version of code or library you are using, always note that the 'query' will be expanded over the time axis to prepare it for the subsequent addition that follows. This value (that is being expanded) will always be the last hidden state of the RNN. The other value will always be the values that need to be attended to - all the hidden states at the encoder end. This simple check of the code can be done to determine what 'query' and 'values' map to irrespective of the library or code that you are using.
You can refer to https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e to write your own custom attention layer in less than 6 lines of code
There is a problem with the way you initialize attention layer
and pass parameters. You should specify the number of attention layer
units in this place and modify the way of passing in parameters:
context_vector, attention_weights = Attention(32)(lstm, state_h)
The result:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 200) 0
__________________________________________________________________________________________________
embedding (Embedding) (None, 200, 128) 32000 input_1[0][0]
__________________________________________________________________________________________________
bi_lstm_0 (Bidirectional) [(None, 200, 256), ( 263168 embedding[0][0]
__________________________________________________________________________________________________
bidirectional (Bidirectional) [(None, 200, 256), ( 394240 bi_lstm_0[0][0]
bi_lstm_0[0][1]
bi_lstm_0[0][2]
bi_lstm_0[0][3]
bi_lstm_0[0][4]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 256) 0 bidirectional[0][1]
bidirectional[0][3]
__________________________________________________________________________________________________
attention (Attention) [(None, 256), (None, 16481 bidirectional[0][0]
concatenate[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 1) 257 attention[0][0]
==================================================================================================
Total params: 706,146
Trainable params: 706,146
Non-trainable params: 0
__________________________________________________________________________________________________
None