KeyError: ''val_loss" when training model
This callback runs at the end of iteration 3.
checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
monitor='val_loss', save_weights_only=True, save_best_only=True, period=3)
The error message is claiming that there is no val_loss in the logs
variable when executing:
filepath = self.filepath.format(epoch=epoch + 1, **logs)
This would happen if fit is called without validation_data.
I would start by simplifying the path name for model checkpoint. It is probably enough to include the epoch in the name.
this error happens when we are not providing validation data to the model, And check the parameters of the model.fit_generator(or model.fit)(train_data, steps_per_epoch,validation_data, validation_steps, epochs,initial_epoch, callbacks)
Use val_accuracy
in the filepath and checkpoint. If it still doesn't improve just restart the pc or colab.
This answer doesn't apply to the question, but this was at the top of the Google results for keras "KeyError: 'val_loss'"
so I'm going to share the solution for my problem.
The error was the same for me: when using val_loss
in the checkpoint file name, I would get the following error: KeyError: 'val_loss'
. My checkpointer was also monitoring this field, so even if I took the field out of the file name, I would still get this warning from the checkpointer: WARNING:tensorflow:Can save best model only with val_loss available, skipping.
In my case, the issue was that I was upgrading from using Keras and Tensorflow 1 separately to using the Keras that came with Tensorflow 2. The period
param for ModelCheckpoint
had been replaced with save_freq
. I erroneously assumed that save_freq
behaved the same way, so I set it to save_freq=1
thinking this would save it every epic. However, the docs state:
save_freq: 'epoch' or integer. When using 'epoch', the callback saves the model after each epoch. When using integer, the callback saves the model at end of a batch at which this many samples have been seen since last saving. Note that if the saving isn't aligned to epochs, the monitored metric may potentially be less reliable (it could reflect as little as 1 batch, since the metrics get reset every epoch). Defaults to 'epoch'
Setting save_freq='epoch'
solved the issue for me. Note: the OP was still using period=1
so this is definitely not what was causing their problem