What would be a good loss function to penalize the magnitude and sign difference
From what I understand, your current loss function is something like:
loss = mean_square_error(y, y_pred)
What you could do, is to add one other component to your loss, being this a component that penalizes negative numbers and does nothing with positive numbers. And you can choose a coefficient for how much you want to penalize it. For that, we can use like a negative shaped ReLU. Something like this:
Let's call "Neg_ReLU" to this component. Then, your loss function will be:
loss = mean_squared_error(y, y_pred) + Neg_ReLU(y_pred)
So for example, if your result is -1, then the total error would be:
mean_squared_error(1, -1) + 1
And if your result is 3, then the total error would be:
mean_squared_error(1, -1) + 0
(See in the above function how Neg_ReLU(3) = 0, and Neg_ReLU(-1) = 1.
If you want to penalize more the negative values, then you can add a coefficient:
coeff_negative_value = 2
loss = mean_squared_error(y, y_pred) + coeff_negative_value * Neg_ReLU
Now the negative values are more penalized.
The ReLU negative function we can build it like this:
tf.nn.relu(tf.math.negative(value))
So summarizing, in the end your total loss will be:
coeff = 1
Neg_ReLU = tf.nn.relu(tf.math.negative(y))
total_loss = mean_squared_error(y, y_pred) + coeff * Neg_ReLU
This turned out to be a really interesting question - thanks for asking it! First, remember that you want your loss functions to be defined entirely of differential operations, so that you can back-propagation though it. This means that any old arbitrary logic won't necessarily do. To restate your problem: you want to find a differentiable function of two variables that increases sharply when the two variables take on values of different signs, and more slowly when they share the same sign. Additionally, you want some control over how sharply these values increase, relative to one another. Thus, we want something with two configurable constants. I started constructing a function that met these needs, but then remembered one you can find in any high school geometry text book: the elliptic paraboloid!
The standard formulation doesn't meet the requirement of sign agreement symmetry, so I had to introduce a rotation. The plot above is the result. Note that it increases more sharply when the signs don't agree, and less sharply when they do, and that the input constants controlling this behaviour are configurable. The code below is all that was needed to define and plot the loss function. I don't think I've ever used a geometric form as a loss function before - really neat.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
def elliptic_paraboloid_loss(x, y, c_diff_sign, c_same_sign):
# Compute a rotated elliptic parabaloid.
t = np.pi / 4
x_rot = (x * np.cos(t)) + (y * np.sin(t))
y_rot = (x * -np.sin(t)) + (y * np.cos(t))
z = ((x_rot**2) / c_diff_sign) + ((y_rot**2) / c_same_sign)
return(z)
c_diff_sign = 4
c_same_sign = 2
a = np.arange(-5, 5, 0.1)
b = np.arange(-5, 5, 0.1)
loss_map = np.zeros((len(a), len(b)))
for i, a_i in enumerate(a):
for j, b_j in enumerate(b):
loss_map[i, j] = elliptic_paraboloid_loss(a_i, b_j, c_diff_sign, c_same_sign)
fig = plt.figure()
ax = fig.gca(projection='3d')
X, Y = np.meshgrid(a, b)
surf = ax.plot_surface(X, Y, loss_map, cmap=cm.coolwarm,
linewidth=0, antialiased=False)
plt.show()