In Tensorflow, what is the difference between sampled_softmax_loss and softmax_cross_entropy_with_logits
If your target vocabulary(or in other words amount of classes you want to predict) is really big, it is very hard to use regular softmax, because you have to calculate probability for every word in dictionary. By Using sampled_softmax_loss
you only take in account subset V of your vocabulary to calculate your loss.
Sampled softmax only makes sense if we sample(our V) less than vocabulary size. If your vocabulary(amount of labels) is small, there is no point using sampled_softmax_loss
.
You can see implementation details in this paper: http://arxiv.org/pdf/1412.2007v2.pdf
Also you can see example where it is used - Sequence to sequence translation in this example
Sampled:
Sampled, in both cases means you don't calculate it for all of what's possible as an output (e.g.: if there are too many words in a dictionary to take all of them at each derivation, so we take just a few samples and learn on that for NLP problems).
softmax_cross_entropy_with_logits
:
This is the cross entropy and receives logits as inputs and yields what can be used as a loss.
sampled_softmax_loss
:
This is a sampled softmax_cross_entropy_with_logits, so it takes just a few samples before using the cross entropy rather than using the full cross entropy: https://github.com/tensorflow/tensorflow/blob/r1.2/tensorflow/python/ops/nn_impl.py#L1269