preprocess_input() method in keras
I found that preprocessing your data while yours is a too different dataset vs the pre_trained model/dataset, then it may harm your accuracy somehow. If you do transfer learning and freezing some layers from a pre_trained model/their weights, simply /255.0 your original dataset does the job just fine, at least for large 1/2 millions samples food dataset. Ideally you should know your std/mean of you dataset and use it instead of using std/mdean of the pre-trained model preprocess.
Keras works with batches of images. So, the first dimension is used for the number of samples (or images) you have.
When you load a single image, you get the shape of one image, which is (size1,size2,channels)
.
In order to create a batch of images, you need an additional dimension: (samples, size1,size2,channels)
The preprocess_input
function is meant to adequate your image to the format the model requires.
Some models use images with values ranging from 0 to 1. Others from -1 to +1. Others use the "caffe" style, that is not normalized, but is centered.
From the source code, Resnet is using the caffe style.
You don't need to worry about the internal details of preprocess_input
. But ideally, you should load images with the keras functions for that (so you guarantee that the images you load are compatible with preprocess_input
).
This loads an image and resizes the image to (224, 224):
img = image.load_img(img_path, target_size=(224, 224))
The img_to_array() function adds channels: x.shape = (224, 224, 3)
for RGB and (224, 224, 1)
for gray image
x = image.img_to_array(img)
expand_dims()
is used to add the number of images: x.shape = (1, 224, 224, 3)
:
x = np.expand_dims(x, axis=0)
preprocess_input subtracts the mean RGB channels of the imagenet dataset. This is because the model you are using has been trained on a different dataset: x.shape
is still (1, 224, 224, 3)
x = preprocess_input(x)
If you add x
to an array images
, at the end of the loop, you need to add images = np.vstack(images)
so that you get (n, 224, 224, 3)
as the dim of images where n
is the number of images processed