What is the difference between Gradient Descent and Newton's Gradient Descent?
At a local minimum (or maximum) x
, the derivative of the target function f
vanishes: f'(x) = 0
(assuming sufficient smoothness of f
).
Gradient descent tries to find such a minimum x
by using information from the first derivative of f
: It simply follows the steepest descent from the current point. This is like rolling a ball down the graph of f
until it comes to rest (while neglecting inertia).
Newton's method tries to find a point x
satisfying f'(x) = 0
by approximating f'
with a linear function g
and then solving for the root of that function explicitely (this is called Newton's root-finding method). The root of g
is not necessarily the root of f'
, but it is under many circumstances a good guess (the Wikipedia article on Newton's method for root finding has more information on convergence criteria). While approximating f'
, Newton's method makes use of f''
(the curvature of f
). This means it has higher requirements on the smoothness of f
, but it also means that (by using more information) it often converges faster.
Put simply, gradient descent you just take a small step towards where you think the zero is and then recalculate; Newton's method, you go all the way there.
Edit 2017: The original link is dead - but the way back machine still got it :) https://web.archive.org/web/20151122203025/http://www.cs.colostate.edu/~anderson/cs545/Lectures/week6day2/week6day2.pdf
this power point the main ideas are explained simply http://www.cs.colostate.edu/~anderson/cs545/Lectures/week6day2/week6day2.pdf
I hope this help :)