[x, options, flog, pointlog] = graddesc(f, x, options, gradf) uses
batch gradient descent to find a local minimum of the function
f(x) whose gradient is given by gradf(x). A log of the function values
after each cycle is (optionally) returned in errlog, and a log
of the points visited is (optionally) returned in pointlog.
Note that x is a row vector
and f returns a scalar value.
The point at which f has a local minimum
is returned as x. The function value at that point is returned
in options(8).
graddesc(f, x, options, gradf, p1, p2, ...) allows
additional arguments to be passed to f() and gradf().
The optional parameters have the following interpretations.
options(1) is set to 1 to display error values; also logs error
values in the return argument errlog, and the points visited
in the return argument pointslog. If options(1) is set to 0,
then only warning messages are displayed. If options(1) is -1,
then nothing is displayed.
options(2) is the absolute precision required for the value
of x at the solution. If the absolute difference between
the values of x between two successive steps is less than
options(2), then this condition is satisfied.
options(3) is a measure of the precision required of the objective
function at the solution. If the absolute difference between the
objective function values between two successive steps is less than
options(3), then this condition is satisfied.
Both this and the previous condition must be
satisfied for termination.
options(7) determines the line minimisation method used. If it
is set to 1 then a line minimiser is used (in the direction of the negative
gradient). If it is 0 (the default), then each parameter update
is a fixed multiple (the learning rate)
of the negative gradient added to a fixed multiple (the momentum) of
the previous parameter update.
options(9) should be set to 1 to check the user defined gradient
function gradf with gradchek. This is carried out at
the initial parameter vector x.
options(10) returns the total number of function evaluations (including
those in any line searches).
options(11) returns the total number of gradient evaluations.
options(14) is the maximum number of iterations; default 100.
options(15) is the precision in parameter space of the line search;
default foptions(2).
options(17) is the momentum; default 0.5. It should be scaled by the
inverse of the number of data points.
options(18) is the learning rate; default 0.01. It should be
scaled by the inverse of the number of data points.
options = zeros(1, 18); options(17) = 0.1/size(x, 1); net = netopt(net, options, x, t, 'graddesc');Note how the learning rate is scaled by the number of data points.
conjgrad, linemin, olgd, minbrack, quasinew, scgCopyright (c) Ian T Nabney (1996-9)