Multiple category classification in Caffe
Nice question. I believe there is no single "canonical" answer here and you may find several different approaches to tackle this problem. I'll do my best to show one possible way. It is slightly different than the question you asked, so I'll re-state the problem and suggest a solution.
The problem: given an input image and a set of C
classes, indicate for each class if it is depicted in the image or not.
Inputs: in training time, inputs are pairs of image and a C
-dim binary vector indicating for each class of the C
classes if it is present in the image or not.
Output: given an image, output a C
-dim binary vector (same as the second form suggested in your question).
Making caffe do the job: In order to make this work we need to modify the top layers of the net using a different loss.
But first, let's understand the usual way caffe is used and then look into the changes needed.
The way things are now: image is fed into the net, goes through conv/pooling/... layers and finally goes through an "InnerProduct"
layer with C
outputs. These C
predictions goes into a "Softmax"
layer that inhibits all but the most dominant class. Once a single class is highlighted "SoftmaxWithLoss"
layer checks that the highlighted predicted class matches the ground truth class.
What you need: the problem with the existing approach is the "Softmax"
layer that basically selects a single class. I suggest you replace it with a "Sigmoid"
layer that maps each of the C
outputs into an indicator whether this specific class is present in the image. For training, you should use "SigmoidCrossEntropyLoss"
instead of the "SoftmaxWithloss"
layer.