Octave : logistic regression : difference between fmincg and fminunc

You will have to look inside the code of fmincg because it is not part of Octave. After some search I found that it's a function file provided by the Machine Learning class of Coursera as part of the homework. Read the comments and answers on this question for a discussion about the algorithms.


In contrast to other answers here suggesting the primary difference between fmincg and fminunc is accuracy or speed, perhaps the most important difference for some applications is memory efficiency. In programming exercise 4 (i.e., Neural Network Training) of Andrew Ng's Machine Learning class at Coursera, the comment in ex4.m about fmincg is

%% =================== Part 8: Training NN ===================
% You have now implemented all the code necessary to train a neural
% network. To train your neural network, we will now use "fmincg", which
% is a function which works similarly to "fminunc". Recall that these
% advanced optimizers are able to train our cost functions efficiently as
% long as we provide them with the gradient computations.

Like the original poster, I was also curious about how the results of ex4.m might differ using fminunc instead of fmincg. So I tried to replace the fmincg call

options = optimset('MaxIter', 50);
[nn_params, cost] = fmincg(costFunction, initial_nn_params, options);

with the following call to fminunc

options = optimset('GradObj', 'on', 'MaxIter', 50);
[nn_params, cost, exit_flag] = fminunc(costFunction, initial_nn_params, options);

but got the following error message from a 32-bit build of Octave running on Windows:

error: memory exhausted or requested size too large for range of Octave's index type -- trying to return to prompt

A 32-bit build of MATLAB running on Windows provides a more detailed error message:

Error using find
Out of memory. Type HELP MEMORY for your options.
Error in spones (line 14)
[i,j] = find(S);
Error in color (line 26)
J = spones(J);
Error in sfminbx (line 155)
group = color(Hstr,p);
Error in fminunc (line 408)
[x,FVAL,~,EXITFLAG,OUTPUT,GRAD,HESSIAN] = sfminbx(funfcn,x,l,u, ...
Error in ex4 (line 205)
[nn_params, cost, exit_flag] = fminunc(costFunction, initial_nn_params, options);

The MATLAB memory command on my laptop computer reports:

Maximum possible array: 2046 MB (2.146e+09 bytes) *
Memory available for all arrays: 3402 MB (3.568e+09 bytes) **
Memory used by MATLAB: 373 MB (3.910e+08 bytes)
Physical Memory (RAM): 3561 MB (3.734e+09 bytes)
* Limited by contiguous virtual address space available.
** Limited by virtual address space available.

I previously was thinking that Professor Ng chose to use fmincg to train the ex4.m neural network (that has 400 input features, 401 including the bias input) to increase the training speed. However, now I believe his reason for using fmincg was to increase the memory efficiency enough to allow the training to be performed on 32-bit builds of Octave/MATLAB. A short discussion about the necessary work to get a 64-bit build of Octave that runs on Windows OS is here.


According to Andrew Ng himself, fmincg is used not to get a more accurate result (remember, your cost function will be same in either case, and your hypothesis no simpler or more complex) but because it is more efficient at doing gradient descent for especially complex hypotheses. He himself seems to use fminunc where the hypothesis have few features, but fmincg where it has hundreds.