Defining model in keras (include_top = True)

Reading the documentation sheds some light, and you can also resort to the code. Having include_top=True means that a fully-connected layer will be added at the end of the model. This is usually what you want if you want the model to actually perform classification. With include_top=True you can specify the parameter classes (defaults to 1000 for ImageNet). With include_top=False, the model can be used for feature extraction, for example to build an autoencoder or to stack any other model on top of it. Note that input_shape and pooling parameters should only be specified when include_top is False.


Most of these models are a series of convolutional layers followed by one or a few dense (or fully connected) layers.

Include_top lets you select if you want the final dense layers or not.

  • the convolutional layers work as feature extractors. They identify a series of patterns in the image, and each layer can identify more elaborate patterns by seeing patterns of patterns.

  • the dense layers are capable of interpreting the found patterns in order to classify: this image contains cats, dogs, cars, etc.

About the weights:

  • the weights in a convolutional layer are fixed-size. They are the size of the kernel x filters. Example: a 3x3 kernel of 10 filters. A convolutional layer doesn't care about the size of the input image. It just does the convolutions and present a resulting image based on the size of the input image. (Search for some illustrated tutorials about convolutions if this is unclear)

  • now the weights in a dense layer are totally dependent on the input size. It's one weight per element of the input. So this demands that your input be always the same size, or else you won't have proper learned weights.

Because of this, removing the final dense layers allows you to define the input size (see in documentation). (And the output size will increase/decrease accordingly).

But you lose the interpretation/classification layers. (You can add your own, depending on your task)


Extra info on Poolings and Flatten

Global poolings:

After the last convolutional layers, your outputs are still like images. They have shape (images, X, Y, channels), where X and Y are spatial dimensions of a 2D image.

When your model has GlobalMaxPooling2D or GlobalAveragePooling2D, it will eliminate the spatial dimensions. With Max it will take only the highest value pixel for each channel. With Average it will take the mean value of each channel. The result will be just (images, channels), without spatial dimensions anymore.

  • Advantage: since the spatial dimension is discarded, you can have variable size images
  • Disadvantage: you loose a lot of data if you still have big sizes. (This might be ok depending on the model and data)

Flatten

With flatten, the spatial dimensions will not be lost, but they will be transformed in features. From (images, X, Y, channels) to (images, X*Y*channels).

This will require fixed input shapes, because X and Y must be defined, and if you add Dense layers after the flatten, the Dense layer will need a fixed number of features.