Wasserstein loss can be negative?
The Wasserstein loss is a measurement of Earth-Movement distance, which is a difference between two probability distributions. In tensorflow it is implemented as d_loss = tf.reduce_mean(d_fake) - tf.reduce_mean(d_real)
which can obviously give a negative number if d_fake
moves too far on the other side of d_real
distribution. You can see it on your plot where during the training your real and fake distributions changing sides until they converge around zero. So as a performance measurement you can use it to see how far the generator is from the real data and on which side it is now.
See the distributions plot:
P.S. it's crossentropy loss, not Wasserstein. Perhaps this article can help you more, if you didn't read it yet. However, the other question is how the optimizer can minimize the negative loss (to zero).