keras flow_from_directory over or undersample a class
With current version of Keras - it's not possible to balance your dataset using only Keras built-in methods. The flow_from_directory
is simply building a list of all files and their classes, shuffling it (if need) and then it's iterating over it.
But you could do a different trick - by writting your own generator which would make the balancing inside the python
:
def balanced_flow_from_directory(flow_from_directory, options):
for x, y in flow_from_directory:
yield custom_balance(x, y, options)
Here custom_balance
should be a function that given a batch (x, y)
is balancing it and returning a balanced batch (x', y')
. For most of the applications the size of the batch doesn't need to be the same - but there are some weird use cases (like e.g. stateful RNNs) - where batch sizes should have a fixed size).
One thing you can do is set the class_weight
parameter when calling model.fit()
or model.fit_generator()
.
It also happens that you can easily compute your class_weights using sklearn
and numpy
libraries as follows:
from sklearn.utils import class_weight
import numpy as np
class_weights = class_weight.compute_class_weight(
'balanced',
np.unique(train_generator.classes),
train_generator.classes)
Afterwards, it becomes as simple as setting your class_weights
equal to class_weight
parameter:
model.fit_generator(..., class_weight=class_weights)