what is the difference between Flatten() and GlobalAveragePooling2D() in keras
That both seem to work doesn't mean they do the same.
Flatten will take a tensor of any shape and transform it into a one dimensional tensor (plus the samples dimension) but keeping all values in the tensor. For example a tensor (samples, 10, 20, 1) will be flattened to (samples, 10 * 20 * 1).
GlobalAveragePooling2D does something different. It applies average pooling on the spatial dimensions until each spatial dimension is one, and leaves other dimensions unchanged. In this case values are not kept as they are averaged. For example a tensor (samples, 10, 20, 1) would be output as (samples, 1, 1, 1), assuming the 2nd and 3rd dimensions were spatial (channels last).
What a Flatten layer does
After convolutional operations, tf.keras.layers.Flatten
will reshape a tensor into (n_samples, height*width*channels)
, for example turning (16, 28, 28, 3)
into (16, 2352)
. Let's try it:
import tensorflow as tf
x = tf.random.uniform(shape=(100, 28, 28, 3), minval=0, maxval=256, dtype=tf.int32)
flat = tf.keras.layers.Flatten()
flat(x).shape
TensorShape([100, 2352])
What a GlobalAveragePooling layer does
After convolutional operations, tf.keras.layers.GlobalAveragePooling
layer does is average all the values according to the last axis. This means that the resulting shape will be (n_samples, last_axis)
. For instance, if your last convolutional layer had 64 filters, it would turn (16, 7, 7, 64)
into (16, 64)
. Let's make the test, after a few convolutional operations:
import tensorflow as tf
x = tf.cast(
tf.random.uniform(shape=(16, 28, 28, 3), minval=0, maxval=256, dtype=tf.int32),
tf.float32)
gap = tf.keras.layers.GlobalAveragePooling2D()
for i in range(5):
conv = tf.keras.layers.Conv2D(64, 3)
x = conv(x)
print(x.shape)
print(gap(x).shape)
(16, 24, 24, 64)
(16, 22, 22, 64)
(16, 20, 20, 64)
(16, 18, 18, 64)
(16, 16, 16, 64)
(16, 64)
Which should you use?
The Flatten
layer will always have at least as much parameters as the GlobalAveragePooling2D
layer. If the final tensor shape before flattening is still large, for instance (16, 240, 240, 128)
, using Flatten
will make an insane amount of parameters: 240*240*128 = 7,372,800
. This huge number will be multiplied by the number of units in your next dense layer! At that moment, GlobalAveragePooling2D
might be preferred in most cases. If you used MaxPooling2D
and Conv2D
so much that your tensor shape before flattening is like (16, 1, 1, 128)
, it won't make a difference. If you're overfitting, you might want to try GlobalAveragePooling2D
.