Save and load model optimizer state
You can extract the important lines from the load_model
and save_model
functions.
For saving optimizer states, in save_model
:
# Save optimizer weights.
symbolic_weights = getattr(model.optimizer, 'weights')
if symbolic_weights:
optimizer_weights_group = f.create_group('optimizer_weights')
weight_values = K.batch_get_value(symbolic_weights)
For loading optimizer states, in load_model
:
# Set optimizer weights.
if 'optimizer_weights' in f:
# Build train function (to get weight updates).
if isinstance(model, Sequential):
model.model._make_train_function()
else:
model._make_train_function()
# ...
try:
model.optimizer.set_weights(optimizer_weight_values)
Combining the lines above, here's an example:
- First fit the model for 5 epochs.
X, y = np.random.rand(100, 50), np.random.randint(2, size=100)
x = Input((50,))
out = Dense(1, activation='sigmoid')(x)
model = Model(x, out)
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(X, y, epochs=5)
Epoch 1/5
100/100 [==============================] - 0s 4ms/step - loss: 0.7716
Epoch 2/5
100/100 [==============================] - 0s 64us/step - loss: 0.7678
Epoch 3/5
100/100 [==============================] - 0s 82us/step - loss: 0.7665
Epoch 4/5
100/100 [==============================] - 0s 56us/step - loss: 0.7647
Epoch 5/5
100/100 [==============================] - 0s 76us/step - loss: 0.7638
- Now save the weights and optimizer states.
model.save_weights('weights.h5')
symbolic_weights = getattr(model.optimizer, 'weights')
weight_values = K.batch_get_value(symbolic_weights)
with open('optimizer.pkl', 'wb') as f:
pickle.dump(weight_values, f)
- Rebuild the model in another python session, and load weights.
x = Input((50,))
out = Dense(1, activation='sigmoid')(x)
model = Model(x, out)
model.compile(optimizer='adam', loss='binary_crossentropy')
model.load_weights('weights.h5')
model._make_train_function()
with open('optimizer.pkl', 'rb') as f:
weight_values = pickle.load(f)
model.optimizer.set_weights(weight_values)
- Continue model training.
model.fit(X, y, epochs=5)
Epoch 1/5
100/100 [==============================] - 0s 674us/step - loss: 0.7629
Epoch 2/5
100/100 [==============================] - 0s 49us/step - loss: 0.7617
Epoch 3/5
100/100 [==============================] - 0s 49us/step - loss: 0.7611
Epoch 4/5
100/100 [==============================] - 0s 55us/step - loss: 0.7601
Epoch 5/5
100/100 [==============================] - 0s 49us/step - loss: 0.7594
Completing Alex Trevithick answer, it is possible to avoid re calling model.set_weights
, simply by saving the state of the variables before applying the gradient and then reloading. This can useful when loading a model from an h5 file, and looks cleaner (imo).
The saving/loading functions are the following (thanks Alex again):
def save_optimizer_state(optimizer, save_path, save_name):
'''
Save keras.optimizers object state.
Arguments:
optimizer --- Optimizer object.
save_path --- Path to save location.
save_name --- Name of the .npy file to be created.
'''
# Create folder if it does not exists
if not os.path.exists(save_path):
os.makedirs(save_path)
# save weights
np.save(os.path.join(save_path, save_name), optimizer.get_weights())
return
def load_optimizer_state(optimizer, load_path, load_name, model_train_vars):
'''
Loads keras.optimizers object state.
Arguments:
optimizer --- Optimizer object to be loaded.
load_path --- Path to save location.
load_name --- Name of the .npy file to be read.
model_train_vars --- List of model variables (obtained using Model.trainable_variables)
'''
# Load optimizer weights
opt_weights = np.load(os.path.join(load_path, load_name)+'.npy', allow_pickle=True)
# dummy zero gradients
zero_grads = [tf.zeros_like(w) for w in model_train_vars]
# save current state of variables
saved_vars = [tf.identity(w) for w in model_train_vars]
# Apply gradients which don't do nothing with Adam
optimizer.apply_gradients(zip(zero_grads, model_train_vars))
# Reload variables
[x.assign(y) for x,y in zip(model_train_vars, saved_vars)]
# Set the weights of the optimizer
optimizer.set_weights(opt_weights)
return
For those who are not using model.compile
and instead performing automatic differentiation to apply the gradients manually with optimizer.apply_gradients
, I think I have a solution.
First, save the optimizer weights: np.save(path, optimizer.get_weights())
Then, when you are ready to reload the optimizer, show the newly instantiated optimizer the size of the weights it will update by calling optimizer.apply_gradients
on a list of tensors of the size of the variables for which you calculate gradients. It is extremely important to then set the weights of the model AFTER you set the weights of the optimizer because momentum-based optimizers like Adam will update the weights of the model even if we give it gradients which are zero.
import tensorflow as tf
import numpy as np
model = # instantiate model (functional or subclass of tf.keras.Model)
# Get saved weights
opt_weights = np.load('/path/to/saved/opt/weights.npy', allow_pickle=True)
grad_vars = model.trainable_weights
# This need not be model.trainable_weights; it must be a correctly-ordered list of
# grad_vars corresponding to how you usually call the optimizer.
optimizer = tf.keras.optimizers.Adam(lrate)
zero_grads = [tf.zeros_like(w) for w in grad_vars]
# Apply gradients which don't do nothing with Adam
optimizer.apply_gradients(zip(zero_grads, grad_vars))
# Set the weights of the optimizer
optimizer.set_weights(opt_weights)
# NOW set the trainable weights of the model
model_weights = np.load('/path/to/saved/model/weights.npy', allow_pickle=True)
model.set_weights(model_weights)
Note that if we try to set the weights before calling apply_gradients
for the first time, an error is thrown that the optimizer expects a weight list of length zero.