What is the purpose of the add_loss function in Keras?
I was also wondering about the same query and some related stuff like how to add loss function within the intermediate layers. Here I'm sharing some of the observed information, hope it may help others. It's true that standard keras
loss functions only take two arguments, y_true
and y_pred
. But during the experiment, there can some cases where we need some external parameter or coefficient while computing with these two values (y_true
, y_pred
). This can be needed at the last layer as usual or somewhere in the middle of the model's layer.
model.add_loss()
The accepted answer correctly said about the model.add_loss()
functions. It potentially depends on the layer inputs (tensor). According to the official doc, when writing the call
method of a custom layer or a subclassed model, we may want to compute scalar quantities that we want to minimize during training (e.g. regularization losses
). We can use the add_loss()
layer method to keep track of such loss terms. For instance, activity regularization losses dependent on the inputs passed when calling a layer. Here's an example of a layer that adds a sparsity regularization loss based on the L2 norm of the inputs:
from tensorflow.keras.layers import Layer
class MyActivityRegularizer(Layer):
"""Layer that creates an activity sparsity regularization loss."""
def __init__(self, rate=1e-2):
super(MyActivityRegularizer, self).__init__()
self.rate = rate
def call(self, inputs):
# We use `add_loss` to create a regularization loss
# that depends on the inputs.
self.add_loss(self.rate * tf.reduce_sum(tf.square(inputs)))
return inputs
Loss values added via add_loss
can be retrieved in the .losses
list property of any Layer
or Model
(they are recursively retrieved from every underlying layer):
from tensorflow.keras import layers
class SparseMLP(Layer):
"""Stack of Linear layers with a sparsity regularization loss."""
def __init__(self, output_dim):
super(SparseMLP, self).__init__()
self.dense_1 = layers.Dense(32, activation=tf.nn.relu)
self.regularization = MyActivityRegularizer(1e-2)
self.dense_2 = layers.Dense(output_dim)
def call(self, inputs):
x = self.dense_1(inputs)
x = self.regularization(x)
return self.dense_2(x)
mlp = SparseMLP(1)
y = mlp(tf.ones((10, 10)))
print(mlp.losses) # List containing one float32 scalar
Also note, when using model.fit()
, such loss terms are handled automatically. When writing a custom training loop, we should retrieve these terms by hand from model.losses
, like this:
loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()
# Iterate over the batches of a dataset.
for x, y in dataset:
with tf.GradientTape() as tape:
# Forward pass.
logits = model(x)
# Loss value for this batch.
loss_value = loss_fn(y, logits)
# Add extra loss terms to the loss value.
loss_value += sum(model.losses) # < ------------- HERE ---------
# Update the weights of the model to minimize the loss value.
gradients = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(gradients, model.trainable_weights))
Custom losses
With model.add_loss()
, (AFAIK), we can use it somewhere in the middle of the network. Here we no longer bound with only two parameters i.e. y_true
, y_pred
. But what if we also want to impute external parameter or coefficient to the last layer loss functions of the network. Nric answer is correct. But it can also be implemented by subclassing the tf.keras.losses.Loss
class by implementing the following two methods:
__init__(self)
: accept parameters to pass during the call of your loss functioncall(self, y_true, y_pred)
: use the targets(y_true)
and the model predictions(y_pred)
to compute the model's loss
Here is an example of a custom MSE
by subclassing the tf.keras.losses.Loss
class. And here we also no longer bound only two parameters i.e. y_ture
, y_pred
.
class CustomMSE(keras.losses.Loss):
def __init__(self, regularization_factor=0.1, name="custom_mse"):
super().__init__(name=name)
self.regularization_factor = regularization_factor
def call(self, y_true, y_pred):
mse = tf.math.reduce_mean(tf.square(y_true - y_pred))
reg = tf.math.reduce_mean(tf.square(0.5 - y_pred))
return mse + reg * self.regularization_factor
model.compile(optimizer=..., loss=CustomMSE())
I'll try to answer the original question of why model.add_loss()
is being used instead of specifying a custom loss function to model.compile(loss=...)
.
All loss functions in Keras always take two parameters y_true
and y_pred
. Have a look at the definition of the various standard loss functions available in Keras, they all have these two parameters. They are the 'targets' (the Y variable in many textbooks) and the actual output of the model. Most standard loss functions can be written as an expression of these two tensors. But some more complex losses cannot be written in that way. For your VAE example this is the case because the loss function also depends on additional tensors, namely z_log_var
and z_mean
, which are not available to the loss functions. Using model.add_loss()
has no such restriction and allows you to write much more complex losses that depend on many other tensors, but it has the inconvenience of being more dependent on the model, whereas the standard loss functions work with just any model.
(Note: The code proposed in other answers here are somewhat cheating in as much as they just use global variables to sneak in the additional required dependencies. This makes the loss function not a true function in the mathematical sense. I consider this to be much less clean code and I expect it to be more error-prone.)
JIH's answer is right of course but maybe it is useful to add:
model.add_loss()
has no restrictions, but it also removes the comfort of using for example targets in the model.fit()
.
If you have a loss that depends on additional parameters of the model, of other models or external variables, you can still use a Keras type encapsulated loss function by having an encapsulating function where you pass all the additional parameters:
def loss_carrier(extra_param1, extra_param2):
def loss(y_true, y_pred):
#x = complicated math involving extra_param1, extraparam2, y_true, y_pred
#remember to use tensor objects, so for example keras.sum, keras.square, keras.mean
#also remember that if extra_param1, extra_maram2 are variable tensors instead of simple floats,
#you need to have them defined as inputs=(main,extra_param1, extraparam2) in your keras.model instantiation.
#and have them defind as keras.Input or tf.placeholder with the right shape.
return x
return loss
model.compile(optimizer='adam', loss=loss_carrier)
The trick is the last row where you return a function as Keras expects them with just two parameters y_true
and y_pred
.
Possibly looks more complicated than the model.add_loss
version, but the loss stays modular.