Why is a derivative defined using limits?

The idea of a derivative-as-limit was introduced in the 17th Century by Newton and Leibniz (Newton's first description of the derivative pre-dates Leibniz's by 20 or so years, but Newton didn't publish at the time, and the modern consensus is that Leibniz built the theory independently). We remember the names Newton and Leibniz in large part because they had the insight to use the concept of a limit to describe instantaneous rates of change. This was a very difficult idea which (perhaps) required the intellectual force of giants such as Newton and Leibniz.

Even so, neither Newton nor Leibniz really used ideas that we would recognize as limits (in the sense of $\varepsilon$-$\delta$ arguments). Instead, they estimated the quantities of interest with an error term, e.g. $$ \frac{(x-o)^2 - x^2}{o} $$ where $o$ is an "infinitesimal" error term, then performed algebraic manipulations and made the error terms disappear with a wave of the hands. While this approach can be made rigorous (see Robinson's Non-standard analysis, cited below, for a more modern approach to infinitesimals), it isn't quite how we usually think of things.

The modern notion of limit came later. I honestly don't know when it was introduced or by whom (it feels like something that Cauchy or one of his contemporaries might have come up with?). In any event, I would guess that modern $\varepsilon$-$\delta$ arguments date to the early 19th Century (they were certainly well established by the beginning of the 20th Century, but I don't think that mathematicians like Euler or Fourier used an entirely modern approach). In any event, the definition of a limit was another profound intellectual achievement, and is only "obvious" in retrospect.

The point being, it should not be surprising that the jump to calculus via limits is difficult for many students. The notion of a derivative as an instantaneous rate of change was a difficult concept which took a couple of hundred years and the attention of some very smart people to develop.

This comic may be relevant.


That being said, there are certain classes of curves that can be discussed in geometric or algebraic terms. We can build the theory in the following manner (note that this is ahistorical, but makes a pretty good narrative for, say, a group of students in a precalculus class).

The motivating question might be the following:

Given a curve in the plane (or in some higher dimensional space?!) and a point on that curve, what does it mean for a line to be tangent to the curve?

For a circle, we have a really good idea of what we want this to mean: the line touches the curve at exactly one point. From this definition, we are able to do a lot: tangents are perpendicular to radii, we can (after coordinitization) define a bunch of trigonometric functions related to tangent lines, etc. This notion of tangency also generalizes fairly well to other conic sections. However, it does not generalize well to arbitrary curves in the plane (or even arbitrary algebraic curves), which is particularly annoying if you are interested in the graphs of functions.

Another idea is the following: when we look at a line tangent to a circle, the line does not cross the circle—it touches at a point, then "bounces off". This isn't a very rigorous idea, but we can make it a little more rigorous. To do this, let's first consider a parabola.

Using our basic geometric ideas, we can define

Definition: We say that a line $\ell$ is tangent to the graph of $f(x) = ax^2 + bx + c$ if

  • $\ell$ is the graph of a function of the form $\ell(x) = mx + k$ for two real constants $m$ and $k$ (i.e. $\ell$ is not a vertical line; please excuse my abuse of notation, namely using $\ell$ both for the line and the function defining the line); and
  • $\ell$ intersects the graph of $f$ at exactly one point.

This first constraint may seem silly, but we want to eliminate the "obviously wrong" vertical lines which intersect the graph at a single point, but which don't really look like the kinds of tangent lines that we would expect.

This idea can be expressed algebraically: if $\ell$ is tangent to $f$ at the point $(r,f(r))$, then we need $(f-\ell)(r) = 0$ (which means that $f$ and $l$ intersect when $x=r$), and we need $(f-\ell)(x) \ne 0$ for all other $x$ (the line and parabola intersect exactly once). In other words, the function $(f-\ell)(x) = (ax^2 + bx + c) - (mx + k)$ has exactly one real root, namely $x=r$. By the factor theorem, this implies that there is some constant $C$ such that $$ ax^2 + (b-m)x + (c-k) = (f-l)(x) = C(x-r)^2. $$ Expanding out the right-hand side and equating coefficients, we have $$ ax^2 + (b-m)x + (c-k) = Cx^2 - 2Crx + Cr^2 \implies \begin{cases} a = C \\ b-m = -2Cr \\ c-k = Cr^2. \end{cases} $$ Solving for $m$ and $k$, we have $$ m = b+2Cr = b+2ar \qquad\text{and}\qquad k = c - Cr^2 = c-ar^2. $$ Therefore the line tangent to the graph of $$ f(x) = ax^2 + bx + c $$ is the graph of the function $$ \ell(x) = mx + k = (b+2ar)x + (c-ar^2). $$ This Desmos demonstration should be mildly convincing (you can move the point of tangency about by clicking-and-dragging, adjust the coefficients $a$, $b$, and $c$ using the sliders).

The really slick idea here is that tangency has something to do with the way in which a line intersects the parabola. If we look at the difference function $f-\ell$, the point of intersection is a root of order two. After some experimentation, it is reasonable to propose the following, slightly more general definition of tangency:

Definition: Let $p$ be a polynomial of degree $n$. We say that a line $\ell$ is tangent to $p$ at $(r,p(r))$ if the difference function $p-\ell$ has a root of order at least 2 at $r$. That is, $$ (p-\ell)(x) = (x-r)^2 q(x), $$ where $q$ is a polynomial of degree $n-2$.

This notion of tangency actually works rather well, and isn't much more difficult to work out than learning limits (once you know how limits work, have an analytic definition of a tangent line, and have proved useful things like the Power Rule, this algebraic version isn't so great, but learning all that other stuff sounds hard $\ddot\frown$). Generally speaking, you are going to have to multiply out the polynomial $$ (x-r)^2 q(x), $$ which is a relatively tractable problem, then equate coefficients (which reduces the problem to a system of linear equations). If $p$ is of very high degree, this can be tedious, but it requires no knowledge beyond high school algebra (or, perhaps more to the point, it requires no ideas that post-date Newton and Leibniz—Descartes could have (and did) figure it out).

This basic definition generalizes very well to rational functions, and, using the idea that the graph of an inverse function is the reflection of the graph of the original function reflected across the line $y=x$ can be further generalized to deal with functions involving $n$-th roots. If you want to go really deep down the rabbit hole, you might try to prove something like the implicit function theorem and show that this idea can also give you implicit derivatives of any algebraic curve (I don't know how easy or hard this would be to do; I wonder if it might not require some modern ideas out of algebraic geometry? $\ast$shudder$\ast$... sheaves are scary).

Robinson, Abraham, Non-standard analysis, Princeton, NJ: Princeton Univ. Press. xix, 293 p. (1996). ZBL0843.26012.


To find the gradient of a curve at a given point, you need to keep zooming in:

zooming on curves

From the above animation, you can guess the gradients of the different curves (either $0$, $1$ or undefined) but in order to be sure, you'd need an $\infty$ zoom.

This process of "zooming in indefinitely" is why you need the concept of limits when defining the derivative.

PS: I used the same animation as in one of my previous answers.


Xander Handerson's excellent answer shows how you can define the derivative of a polynomial function without needing to refer to limits; here is another (perhaps simpler?) approach.

Consider any polynomial $p(x)$, and choose a value $a\in\mathbb R$. We all know that you can use long division (AKA the "Euclidean algorithm") to divide $p(x)$ by $x-a$ and obtain a quotient, $q_a(x)$, and a remainder, $r(x)$. The remainder is guaranteed to be lower degree than the divisor, and the divisor $x-a$ has degree $1$, so $r(x)$ must be a constant, which we'll just write as $r$. Then we have $$p(x) = (x-a)q_a(x) + r$$ Moreover, setting $x=a$ in the above equation leads immediately to the result $r=p(a)$. In other words, the remainder you get when you divide $p(x)$ by $x-a$ is just $p(a)$. This is usually called the "Remainder Theorem."

Now let's take the equation $p(x) = (x-a)q_a(x) + p(a)$ and rearrange it just slightly:

$$q_a(x) = \frac{p(x) - p(a)}{x-a}$$

This provides a natural geometric interpretation of the polynomial $q_a(x)$: given any point $b \ne a$, $q_a(b)$ is the slope of the line joining $(a, p(a))$ and $(b, p(b))$.

These observations motivate the following definition:

Definition: For any polynomial $p(x)$ and any $a\in \mathbb R$, the derivative of $p(x)$ at $a$, denoted $p'(a)$, is $$p'(a) = q_a(a),$$ where $q_a(x)$ is the quotient obtained by dividing $p(x)$ by $x-a$.

For example, with $p(x) = x^2 - 3x$, if we choose $a=1$ we find that $p(x) = (x-1)(x-2) - 2$, so $q_1(x)=x-2$, and therefore $p'(1) = q_1(1) = -1$. More generally for any $a$ we have $x^2 - 3x = (x-a)(x-3+a) + (a^2-3a)$, so $q_a(x) = x-3+a$ and $p'(a) = q_a(a) = 2a - 3$, exactly as the "usual" definition gives.

Computationally, this makes finding the derivative of even higher-degree polynomials relatively straightforward: just divide $p(x)$ by $x-a$ using the Euclidean algorithm, throw away the remainder, and evaluate the quotient at $a$. However, I don't see any reasonable way to extend this notion to transcendental functions.