Neural Network Cost Function in MATLAB
I think Htheta
is a K*2 array. Note that you need to add bias (x0
and a0
) in the forward cost function calculation. I showed you the array dimensions in each step under the assumption that you have two nodes at input , hidden, and output layers as comments in the code.
m = size(X, 1);
X = [ones(m,1) X]; % m*3 in your case
% W1 2*3, W2 3*2
a2 = sigmf(W1 * X'); % 2*m
a2 = [ones(m,1) a2']; % m*3
Htheta = sigmf(a2 * W2); % m*2
J = (1/m) * sum ( sum ( (-Y) .* log(Htheta) - (1-Y) .* log(1-Htheta) ));
t1 = W1(:,2:size(W1,2));
W2 = W2';
t2 = W2(:,2:size(W2,2));
% regularization formula
Reg = lambda * (sum( sum ( t1.^ 2 )) + sum( sum ( t2.^ 2 ))) / (2*m);
I've implemented neural networks using the same error function as the one you've mentioned above. Unfortunately, I haven't worked with Matlab for quite some time, but I'm fairly proficient in Octave,which hopefully you can still find useful, since many of the functions in Octave are similar to those of Matlab.
@sashkello provided a good snippet of code for computing the cost function. However, this code is written with a loop structure, and I would like to offer a vectorized implementation.
In order to evaluate the current theta values, we need to perform a feed forward/ forward propagation
throughout the network. I'm assuming you know how to write the feed forward code, since you're only concerned with the J(theta)
errors. Let the vector representing the results of your forward propagation be F
Once you've performed feedforward, you'll need to carry out the equation. Note, I'm implementing this in a vectorized manner.
J = (-1/m) * sum(sum(Y .* log(F) + (1-Y) .* log(1-F),2));
This will compute the part of the summation concerning:
Now we must add the regularization term, which is:
Typically, we would have arbitrary number of theta matrices, but in this case we have 2, so we can just perform several sums to get:
J =J + (lambda/(2*m)) * (sum(sum(theta_1(:,2:end).^2,2)) + sum(sum(theta_2(:,2:end).^2,2)));
Notice how in each sum I'm only working from the second column through the rest. This is because the first column will correspond to the theta
values we trained for the `bias units.
So there's a vectorized implementation of the computation of J.
I hope this helps!