Tensorflow: save the model with smallest validation error

This can be done with checkpoints. In tensorflow 1:

# you should import other functions/libs as needed to build the model

from keras.callbacks.callbacks import ModelCheckpoint

# add checkpoint to save model with lowest val loss
filepath = 'tf1_mnist_cnn.hdf5'
save_checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, \
                             save_best_only=True, save_weights_only=False, \
                             mode='auto', period=1)

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test),
          callbacks=[save_checkpoint])

Tensorflow 2:

# import other libs as needed for building model
from tensorflow.keras.callbacks import ModelCheckpoint

# add a checkpoint to save the lowest validation loss
filepath = 'tf2_mnist_model.hdf5'

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, \
                             save_best_only=True, save_weights_only=False, \
                             mode='auto', save_frequency=1)

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test),
          callbacks=[checkpoint])

Complete demo files are here: https://github.com/nateGeorge/slurm_gpu_ubuntu/tree/master/demo_files.

You need to calculate the classification accuracy on the validation-set and keep track of the best one seen so far, and only write the checkpoint once an improvement has been found to the validation accuracy.

If the data-set and/or model is large, then you may have to split the validation-set into batches to fit the computation in memory.

This tutorial shows exactly how to do what you want:

https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/04_Save_Restore.ipynb

It is also available as a short video:

https://www.youtube.com/watch?v=Lx8JUJROkh0

Tensorflow: save the model with smallest validation error

Tags:

Machine Learning

Tensorflow

Related

Recent Posts