Backpropogation activation derivative
Found my answer here. The output delta does require multiplication by the derivative of the activation as in.
d = (a - y) * g'(z)
However, Ng is making use of the cross-entropy cost function which results in a delta that cancels the g'(z) resulting in the d = a - y calculation shown in the video. If a mean squared error cost function is used instead, the derivative of the activation function must be present.
When using Neural Networks it depends on the learning task how you need to design your network. A common approach for regression tasks is to use the tanh() activation functions for the input and all hidden layers and then the output layer uses an linear activation function (img taken from here)
I did' not find the source, but there was an theorem which states that using non-linear together with linear activaion functions allows you to better approximate the target functions. An example of using different activation functions can be found here and here.
The are many different kinds of acitvation function which can be used (img taken from here). If you look at the derivatives you can see that the derivative of the linar function equals to 1 which then will not be mentions anymore. This is also the case for Ng,s explanation, if you look at minute 12 in the video you see that he is talking about the outputlayer.
Concerning the Backpropagation-Algorithm
"When neuron is located in the output layer of the network, it is supplied with a desired response of its own. We may use e(n) = d(n) - y(n)
to compute the error signal e(n)
associated with this neuron; see Fig. 4.3. Having determined e(n)
, we find it a straightforward matter to compute the local gradient [...] When neuron is located in a hidden layer of the network, there is no specified desired response for that neuron. Accordingly, the error signal for a hidden neuron would have to be determined recursively and working backwards in terms of the error signals of all the neurons to which that hidden neuron is directly connected"
Haykin, Simon S., et al. Neural networks and learning machines. Vol. 3. Upper Saddle River: Pearson Education, 2009. p 159-164