What is the difference between sparse_categorical_crossentropy and categorical_crossentropy?
Simply:
categorical_crossentropy
(cce
) produces a one-hot array containing the probable match for each category,sparse_categorical_crossentropy
(scce
) produces a category index of the most likely matching category.
Consider a classification problem with 5 categories (or classes).
In the case of
cce
, the one-hot target may be[0, 1, 0, 0, 0]
and the model may predict[.2, .5, .1, .1, .1]
(probably right)In the case of
scce
, the target index may be [1] and the model may predict: [.5].
Consider now a classification problem with 3 classes.
- In the case of
cce
, the one-hot target might be[0, 0, 1]
and the model may predict[.5, .1, .4]
(probably inaccurate, given that it gives more probability to the first class) - In the case of
scce
, the target index might be[0]
, and the model may predict[.5]
Many categorical models produce scce
output because you save space, but lose A LOT of information (for example, in the 2nd example, index 2 was also very close.) I generally prefer cce
output for model reliability.
There are a number of situations to use scce
, including:
- when your classes are mutually exclusive, i.e. you don't care at all about other close-enough predictions,
- the number of categories is large to the prediction output becomes overwhelming.
From the TensorFlow source code, the sparse_categorical_crossentropy
is defined as categorical crossentropy
with integer targets:
def sparse_categorical_crossentropy(target, output, from_logits=False, axis=-1):
"""Categorical crossentropy with integer targets.
Arguments:
target: An integer tensor.
output: A tensor resulting from a softmax
(unless `from_logits` is True, in which
case `output` is expected to be the logits).
from_logits: Boolean, whether `output` is the
result of a softmax, or is a tensor of logits.
axis: Int specifying the channels axis. `axis=-1` corresponds to data
format `channels_last', and `axis=1` corresponds to data format
`channels_first`.
Returns:
Output tensor.
Raises:
ValueError: if `axis` is neither -1 nor one of the axes of `output`.
"""
From the TensorFlow source code, the categorical_crossentropy
is defined as categorical cross-entropy between an output tensor and a target tensor.
def categorical_crossentropy(target, output, from_logits=False, axis=-1):
"""Categorical crossentropy between an output tensor and a target tensor.
Arguments:
target: A tensor of the same shape as `output`.
output: A tensor resulting from a softmax
(unless `from_logits` is True, in which
case `output` is expected to be the logits).
from_logits: Boolean, whether `output` is the
result of a softmax, or is a tensor of logits.
axis: Int specifying the channels axis. `axis=-1` corresponds to data
format `channels_last', and `axis=1` corresponds to data format
`channels_first`.
Returns:
Output tensor.
Raises:
ValueError: if `axis` is neither -1 nor one of the axes of `output`.
"""
The meaning of integer targets is that the target labels should be in the form of an integer list that shows the index of class, for example:
For
sparse_categorical_crossentropy
, For class 1 and class 2 targets, in a 5-class classification problem, the list should be [1,2]. Basically, the targets should be in integer form in order to callsparse_categorical_crossentropy
. This is called sparse since the target representation requires much less space than one-hot encoding. For example, a batch withb
targets andk
classes needsb * k
space to be represented in one-hot, whereas a batch withb
targets andk
classes needsb
space to be represented in integer form.For
categorical_crossentropy
, for class 1 and class 2 targets, in a 5-class classification problem, the list should be[[0,1,0,0,0], [0,0,1,0,0]]
. Basically, the targets should be in one-hot form in order to callcategorical_crossentropy
.
The representation of the targets are the only difference, the results should be the same since they are both calculating categorical crossentropy.