CS231n: How to calculate gradient for Softmax loss function?
I know this is late but here's my answer:
I'm assuming you are familiar with the cs231n Softmax loss function. We know that:
So just as we did with the SVM loss function the gradients are as follows:
Hope that helped.
Not sure if this helps, but:
is really the indicator function , as described here. This forms the expression (j == y[i])
in the code.
Also, the gradient of the loss with respect to the weights is:
where
which is the origin of the X[:,i]
in the code.