Inconsistencies in the definition of derivative of a polynomial over a field

  • First Issues

If $F$ is a field, we always identify the natural number $n$ with the element $$1+1+..+1 \in F$$ This definition can be extended easily to $\mathbb Z$.

So, to answer the question $$D(x^2)=(1+1)x=x+x$$

  • Second issue

First, $2x^2+x \neq 0$ when $x=2$.

Also, you are making a very common mistake here. Even if the corresponding functions would be the same, the polynomials $2x^2+x$ and $0$ would still be different polynomials over $\mathbb Z_3$, and hence their derivatives would not need be equal.

A polynomial is just an algebraic extension. Over infinite fields, polynomials correspond uniquely to a function, but over finite fields you cannot identify polynomials with their corresponding function, you have to think about them just as polynomials.


First of all, the formula you actually want is $$ D\left(\sum_{i = 0}^n a_i x^i\right) := \sum_{i = 1}^n i a_i x^{i - 1} $$ (you had $n$ in the index of the coefficients in the sum and $n$ instead of $i$ in the derivative). Now, to answer your questions:

First, we cannot assume that $F$ contains the natural numbers as a subset. For example, what is $D(x^2)$ if $F = \Bbb Z/(2)$?

Any commutative unital ring admits a unique map from $\Bbb Z$, determined by sending $1\mapsto 1$. So, if your field $F$ has characteristic $0$, $\Bbb Z$ can literally be thought of as a subset of $F$, via $n\mapsto\underbrace{1 + 1 + \dots + 1}_{n\textrm{ times}}$. If your field has characteristic $p$, then you may think of $\Bbb Z$ as mapping to the field, but the elements of $\Bbb Z$ become identified with their reductions modulo $p$. So if you wanted to be annoyingly precise, you could say let $\iota : \Bbb Z\to F$ be the unique ring homomorphism sending $1\mapsto 1$. Then $$ D\left(\sum_{i = 0}^n a_i x^i\right) := \sum_{i = 1}^n \iota(i) a_i x^{i - 1}. $$ So, $D(x^2) = \iota(2) x = 0$, since $2 \equiv 0\pmod{2}$, and the map $\Bbb Z\to\Bbb Z/(2)$ is reduction modulo $2$.

Furthermore, the derivative may not be well-defined, depending on the definition of equality for polynomials. For example, if $F=\Bbb Z/(3)$, then $2x^2+x=0$ for $x\in\{0,1,2\}$, but $D(2x^2+x)=2(2)x+1=x+1\neq 0=D(0)$.

The issue here is that you're identifying a polynomial in $F[x]$ with the function $F\to F$ it defines (via $\alpha\mapsto f(\alpha)$). A polynomial is simply a (finite) formal sum with coefficients in $F$, and two polynomials $\sum a_i x^i$ and $\sum b_i x^i$ are equal if and only if $a_i = b_i$ for all $i$. As you noted, two different formal sums may define the same function $F\to F$. However, any polynomial $f\in F[x]$ also defines a function $K\to K$ for any extension field $K$ of $F$, and once you pass to a suitable $K$, two polynomials will be equal if and only if they define the same function (taking $K$ to be the algebraic closure of $F$ will always do the trick, because a polynomial $f$ over a field has exactly $\deg f$ roots counted with multiplicity over the algebraic closure, and thus is determined by its value at $\deg f + 1$ elements of the algebraic closure). So a priori, a polynomial $f\in F[x]$ has more data than the function $F\to F$ it defines!


Several of the other answers have pointed out the danger of conflating a polynomial $p$ with a polynomial function. When working over the real numbers this distinction isn't particularly important, as there is a one-to-one correspondence between the two types of objects, but over arbitrary rings (even other fields) the distinction is very important. The definition of derivative given in the OP applies to polynomials, and one way of paraphrasing the question is to ask under what conditions it can be applied consistently to polynomial functions.

With that as framing, here are a couple of observations that (I think) have not yet been made by the other answers.

Let $R$ be any ring, and let $p \in R[x]$ be any polynomial with coefficients in $R$. Note that we are thinking of $p$ here as a formal expression of the form $a_n x^n + \cdots a_0$, where each $a_k \in R$, but we are not thinking of $p$ as a function on $R$. However, $p$ does naturally induce a function $R \to R$, and it is useful to have a notation for that function, so let's call it $\hat{p}$. We now have two different objects, $$p \in R[x]$$ and $$\hat{p} \in R^R$$ where $R^R$ denotes the set of all functions that map $R \to R$. Both $R[x]$ and $R^R$ are rings: in $R[x]$ the addition and multiplication operations are "formal" (i.e. you just use the distributive and associative properties to expand and simplify a combination of polynomials) whereas in $R^R$ the addition and multiplication operations are "pointwise" (e.g., if $f,g \in R^R$ then $f+g$ is defined to be the function that maps $r \in R$ to $f(r) + g(r)$). One can now check that the association $p \mapsto \hat{p}$ is a ring homomorphism.

This homomorphism (which we can call the "functional interpretation map" and denote by $\Phi$) is not, in general, one-to-one. For instance if $R=\mathbb{Z}_p$ for some prime $p$ then the kernel of $\Phi$ is generated by $x^p - x$. This means that when $x^p - x$ is interpreted as a function, it "acts like" the (constant) zero function. Put another way we can say that $x^p$ and $x$ are equivalent as functions even though they are distinct as polynomials. More precisely, using the notation above we can write $\hat{x^p} = \hat{x}$ even though $x^p \ne x$.

This example also shows that the formal derivative rule in the OP does not "work" at the level of functions, because the formal derivative of $x^p$ is $p\cdot x^{p-1} = 0$, whereas the formal derivative of $x$ is $1$, so polynomials that are equivalent when interpreted as functions do not necessarily have equivalent derivatives.

(For the general question "When is $\Phi$ one-to-one?", see https://mathoverflow.net/questions/160986/rings-for-which-no-polynomial-induces-the-zero-function.)

So the other answers to this question, which point out that the formal derivative definition applies in the context of "formal polynomials" but not necessarily in the context of "polynomial functions", are spot on. However, notwithstanding that it turns out that there is an alternative way to define the derivative of a polynomial with coefficients in an arbitrary ring that (a) directly engages with (rather than avoids) the interpretation of a polynomial as a function, and (b) generalizes the connection between derivatives and difference quotients, but (c) avoids the need to use limits. In the familiar context where $R=\mathbb{R}$, it reproduces the standard theory of differentiation; for general $R$, it reproduces the formal definition given in the OP.

Here's how it works:

Let $p \in R[x]$ be an arbitrary polynomial. Choose any element $a \in R$. Then we can formally divide $p$ by $x-a$ and obtain a quotient $q$ and a remainder $r$. (We can determine $q$ and $r$ using either the long division or synthetic division algorithm -- they both work just fine over arbitrary rings.) Note that $q,r \in R[x]$; we are not yet interpreting these as functions. Furthermore since $r$ must be lower-degree than the divisor $x-a$, it must be a constant, i.e. an element of $R$ itself. (Here I am using the natural embedding of $R$ as a subring of $R[x]$.) We now have the relationship $$p = (x-a)q + r$$ Where $p, q \in R[x]$ and $r \in R$.

Now let's apply the functional interpretation homomorphism $\Phi$ to this equation. We find that for any $b \in R$, $$\hat{p}(b) = (b-a)\hat{q}(b) + r$$ and in particular $$\hat{p}(a) = r$$ This is the analogue of what is called (in high school algebra) "the Remainder Theorem". It tells us what we can rewrite the relationship between $p$ and $q$ as $$p = (x-a)q + \hat{p}(a)$$ or as $$p - \hat{p}(a) = (x-a)q$$

Interpreting the above as functions and acting on an arbitrary $b\in R$, we get

$$\hat{p}(b) - \hat{p}(a) = (b-a)\hat{q}(b)$$

Now it is very tempting to rewrite the equation above in the form $\hat{q}(b)=\frac{\hat{p}(b) - \hat{p}(a)}{b-a}$. We can't really do that, since division is not defined in $R$. If $b-a$ happens to be an invertible element then we can do it, and more generally if $R$ is an integral domain then we could embed $R$ as a subring of its field of fractions, but for arbitrary $R$ we need to be more careful. However, the equation $\hat{p}(b) - \hat{p}(a) = (b-a)\hat{q}(b)$ does suggest that we can interpret $\hat{q}(b)$ as the "slope" of the "line" joining $\left( a, \hat{p}(a) \right)$ and $\left( b, \hat{p}(b) \right)$.

If we accept this interpretation as a plausible one -- and note that in the case where $R$ is a field it reduces to the standard idea that "slope is rise over run" -- then it is natural to identify $\hat{q}(a)$ as the slope of the "tangent line" to the graph of $\hat{q}$ at $a$.

If all of this seems overly abstract and formal, here are a few observations to set your mind at ease (each of these is easily verified):

  1. If $p=x^n$, so that $\hat{p}(a) = a^n$, then $\hat{q}(a) = n \cdot a^{n-1}$. In other words the "normal" power rule is recovered under this definition.
  2. For any fixed $a$, the mapping $p \in R[x] \mapsto \hat{q}(a)$ is linear. In other words the "normal" linearity of the derivative operation is preserved by this genralization.
  3. The product rule works, too -- although it is notationally hard to express it, and not particularly worthwhile.