Getting some form of keras multi-processing/threading to work on Windows
In combination with a sequence, using multi_processing=False and workers=e.g. 4 does work.
I just realized that in the example code in the question, I was not seeing the speed-up, because the data was being generated too fast. By inserting a time.sleep(2) this becomes evident.
class DummySequence(Sequence):
def __init__(self, x_set, y_set, batch_size):
self.x, self.y = x_set, y_set
self.batch_size = batch_size
def __len__(self):
return int(np.ceil(len(self.x) / float(self.batch_size)))
def __getitem__(self, idx):
batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
time.sleep(2)
return np.array(batch_x), np.array(batch_y)
x = np.random.random((100, 3))
y = to_categorical(np.random.random(100) > .5).astype(int)
seq = DummySequence(x, y, 10)
model = Sequential()
model.add(Dense(32, input_dim=3))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
print('single worker')
model.fit_generator(generator=seq,
steps_per_epoch = 10,
epochs = 2,
verbose=2,
workers=1)
print('achieves speed-up!')
model.fit_generator(generator=seq,
steps_per_epoch = 10,
epochs = 2,
verbose=2,
workers=4,
use_multiprocessing=False)
This produced on my laptop the following:
single worker
>>> model.fit_generator(generator=seq,
... steps_per_epoch = 10,
... epochs = 2,
... verbose=2,
... workers=1)
Epoch 1/2
- 20s - loss: 0.6984 - acc: 0.5000
Epoch 2/2
- 20s - loss: 0.6955 - acc: 0.5100
and
achieves speed-up!
>>> model.fit_generator(generator=seq,
... steps_per_epoch = 10,
... epochs = 2,
... verbose=2,
... workers=4,
... use_multiprocessing=False)
Epoch 1/2
- 6s - loss: 0.6904 - acc: 0.5200
Epoch 2/2
- 6s - loss: 0.6900 - acc: 0.5000
Important notes:
You will probably want self.lock = threading.Lock()
in __init___
and then with self.lock:
in __getitem__
. Try to do the absolute bare minimum required within the with self.lock:
, as far as I understand it, that would be any reference to self.xxxx
(multi-threading is prevent while the with self.lock:
block is running).
Additionally, if you want multithreading to speed up calculations (i.e. CPU operations are the limit), do not expect any speed-up. The global-interpreter lock (GIL) will prevent that. Multithreading will only help you, if the limitation is in I/O operations. Apparently, to speed-up CPU computations we need true multiprocessing, which keras
currently does not support on Windows 10. Perhaps it is possible to hand-craft a multi-processing generator (I have no idea).