What is the difference between Keras' MaxPooling1D and GlobalMaxPooling1D functions?
Td;lr GlobalMaxPooling1D
for temporal data takes the max vector over the
steps dimension. So a tensor with shape [10, 4, 10] becomes a tensor with shape [10, 10] after global pooling. MaxPooling1D
takes the max over the steps too but constrained to a pool_size for each stride. So a [10, 4, 10] tensor with pooling_size=2
and stride=1
is a [10, 3, 10] tensor after MaxPooling(pooling_size=2, stride=1)
Long answer with graphic aid
Lets say we have a simple sentence with 4 words and we have some vector encoding for the words (like word2vec embeddings). Of course you wont normally max pool over and embedding Tensor but this should do for an example. Also global pooling works across channels but I'll leave that out of this illustration. Finally, things get slightly more complicated with padding but we dont need that here either.
Suppose we have MaxPooling1D(pool_size=2, strides=1).
Then
the [[.7, -0.2, .1] | pool size is two
boy [.8, -.3, .2] | so look at two words at a time | stride=1 will
will [.2, -.1, .4] and take the max over those | move the pool down
live [.4 -.4, .8]] 2 vectors. Here we looking 1 word. Now we look
'the' and 'boy'. 'boy' and 'will' and
take the max.
So that will result in a [1, 3, 3] Tensor with the each timestep being the max over a 2D pool. And since we had 3 pools we have effectively downsampled our timesteps from 4 to 3.
However, if we use GlobalMaxPooling1D
we will just take the max vector of that sentence (Tensor) which is probably the vector representation of the word 'live'.
Indeed, here the how GlobalMaxPooling1D is defined in keras
class GlobalMaxPooling1D(_GlobalPooling1D):
"""Global max pooling operation for temporal data.
# Input shape
3D tensor with shape: `(batch_size, steps, features)`.
# Output shape
2D tensor with shape:
`(batch_size, features)`
"""
def call(self, inputs):
return K.max(inputs, axis=1)
Hopefully that helps, please ask for me to clarify anything.
Additionally here is a example that you can play with:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, LSTM, GlobalMaxPooling1D, MaxPooling1D
D = np.random.rand(10, 6, 10)
model = Sequential()
model.add(LSTM(16, input_shape=(6, 10), return_sequences=True))
model.add(MaxPooling1D(pool_size=2, strides=1))
model.add(LSTM(10))
model.add(Dense(1))
model.compile(loss='binary_crossentropy', optimizer='sgd')
# print the summary to see how the dimension change after the layers are
# applied
print(model.summary())
# try a model with GlobalMaxPooling1D now
model = Sequential()
model.add(LSTM(16, input_shape=(6, 10), return_sequences=True))
model.add(GlobalMaxPooling1D())
model.add(Dense(1))
model.compile(loss='binary_crossentropy', optimizer='sgd')
print(model.summary())