How to use .predict_generator() on new Images - Keras
I had some trouble with predict_generator()
. Some posts here helped a lot. I post my solution here as well and hope it will help others. What I do:
- Make predictions on new images using
predict_generator()
- Get filename for each prediction
- Store results in a data frame
I make binary predictions à la "cats and dogs" as documented here. However, the logic can be generalised to multiclass cases. In this case the outcome of the prediction has one column per class.
First, I load my stored model and set up the data generator:
import numpy as np
import pandas as pd
from keras.preprocessing.image import ImageDataGenerator
from keras.models import load_model
# Load model
model = load_model('my_model_01.hdf5')
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(
"C:/kerasimages/pred/",
target_size=(150, 150),
batch_size=20,
class_mode='binary',
shuffle=False)
Note: it is important to specify shuffle=False
in order to preserve the order of filenames and predictions.
Images are stored in C:/kerasimages/pred/images/
. The data generator will only look for images in subfolders of C:/kerasimages/pred/
(as specified in test_generator
). It is important to respect the logic of the data generator, so the subfolder /images/
is required. Each subfolder in C:/kerasimages/pred/
is interpreted as one class by the generator. Here, the generator will report Found x images belonging to 1 classes
(since there is only one subfolder). If we make predictions, classes (as detected by the generator) are not relevant.
Now, I can make predictions using the generator:
# Predict from generator (returns probabilities)
pred=model.predict_generator(test_generator, steps=len(test_generator), verbose=1)
Resetting the generator is not required in this case, but if a generator has been set up before, it may be necessary to rest it using test_generator.reset()
.
Next I round probabilities to get classes and I retrieve filenames:
# Get classes by np.round
cl = np.round(pred)
# Get filenames (set shuffle=false in generator is important)
filenames=test_generator.filenames
Finally, results can be stored in a data frame:
# Data frame
results=pd.DataFrame({"file":filenames,"pr":pred[:,0], "class":cl[:,0]})
So first of all the test images should be placed inside a separate folder inside the test folder. So in my case I made another folder inside test
folder and named it all_classes
.
Then ran the following code:
test_generator = test_datagen.flow_from_directory(
directory=pred_dir,
target_size=(28, 28),
color_mode="rgb",
batch_size=32,
class_mode=None,
shuffle=False
)
The above code gives me an output:
Found 306 images belonging to 1 class
And most importantly you've to write the following code:
test_generator.reset()
else weird outputs will come.
Then using the .predict_generator()
function:
pred=cnn.predict_generator(test_generator,verbose=1,steps=306/batch_size)
Running the above code will give output in probabilities so at first I need to convert them to class number. In my case it was 4 classes, so class numbers were 0,1,2 and 3.
Code written:
predicted_class_indices=np.argmax(pred,axis=1)
Next step is I want the name of the classes:
labels = (train_generator.class_indices)
labels = dict((v,k) for k,v in labels.items())
predictions = [labels[k] for k in predicted_class_indices]
Where by class numbers will be replaced by the class names. One final step if you want to save it to a csv file, arrange it in a dataframe with the image names appended with the class predicted.
filenames=test_generator.filenames
results=pd.DataFrame({"Filename":filenames,
"Predictions":predictions})
Display your dataframe. Everything is done now. You get all the predicted class for your images.