CS231n: How to calculate gradient for Softmax loss function?

I know this is late but here's my answer:

I'm assuming you are familiar with the cs231n Softmax loss function. We know that: enter image description here

So just as we did with the SVM loss function the gradients are as follows: enter image description here

Hope that helped.

Not sure if this helps, but:

$y_i$ is really the indicator function $Ind\{y_i=j\}$ , as described here. This forms the expression (j == y[i]) in the code.

Also, the gradient of the loss with respect to the weights is:

$\frac{dL}{dW} = \frac{dL}{df} \frac{df}{dW$

where

$\frac{df}{dW} = X_i$

which is the origin of the X[:,i] in the code.

Related