Why does Newton's method work?

The method is easiest to justify in one dimension. Say that I have some complicated function $f(x)$ whose root I want to find:

some dumb function

"I don't know how to find its root; it's complicated!" Thus, we use a general idea that has always been used in the design of numerical methods:

Replace a complicated function with a simple approximation.

One of the simplest functions one can deal with is a linear function:

$$f(x)=mx+b$$

In particular, if you want the root of a linear function, it's quite easily figured:

$$x=-\frac{b}{m}$$

Now, it is well-known (or at least, ought to be) that the tangent line of a function is the "best" linear approximation of a function in the vicinity of its point of tangency:

dumb function with tangent line

The first idea of the Newton-Raphson method is that, since it is easy to find the root of a linear function, we pretend that our complicated function is a line, and then find the root of a line, with the hope that the line's crossing is an excellent approximation to the root we actually need.

Mathematically, if we have the tangent line of $f(x)$ at $x=a$, where $a$ is the "starting point":

$$f(x)\approx f(a)+f^\prime(a)(x-a)=0$$

If we want $x$, then

$$x=a-\frac{f(a)}{f^\prime(a)}$$

Let's call this $x_1$.

dumb function with tangent and approximate root

As you can see, the blue point corresponding to the approximation is a bit far off, which brings us to the second idea of Newton-Raphson: if at first you don't succeed, try again:

dumb function with new root approximation

As you can see, the new blue point is much nearer to the red point. Mathematically, this corresponds to finding the root of the new tangent line at $x=x_1$:

$$x_2=x_1-\frac{f(x_1)}{f^\prime(x_1)}$$

We can keep playing this game (with $x_2, x_3, \dots x_n$), up until the point that we find a value where the quantity $\dfrac{f(x_n)}{f^\prime(x_n)}$ is "tiny". We then say that we have converged to an approximation of the root. That is the essence of Newton-Raphson.

As an aside, the previous discussion should tip you on what might happen if the tangent line is nearly horizontal, which is one of the disastrous things that can happen while applying the method.


This animation from the Wikipedia page for Newton's Method might be useful:

enter image description here


Most root finding methods work by replacing the function, which is only known at a few points, with a plausible model, and finding the root of the model.

For instance, the chord and regula falsi methods work from two known points and hypothetise a linear behavior in between. Newton uses a single known point and the direction of the tangent, and also hypothesizes a linear behavior. Brent uses three points and a parabolic interpolation.

enter image description here

In all cases, the reason why it works is simple: because the new estimate is closer to the root.

For this property to hold, the functions must satisfy certain criteria, which are established in the frame of calculus, and essentially mean that the function can be well fitted by the model. In particular, when the function has Taylor develomments, it locally behaves like a polynomial of some degree.