Intuitively, why should the coefficient of the derivative of $x^n$ be $n$?

Picture a cube in $n$-space bounded by the coordinate hyperplanes and other hyperplanes parallel to them at a distance $x$ from them. In the case $n=2$, it's easy to draw the picture: the boundaries are the two coordinate axes and two lines parallel to those, and you're looking at an $x$-by-$x$ square.

The volume of the cube is $x^n$. As $x$ changes, how fast does the volume change?

Use what I call the "boundary rule":

\begin{align} & \text{[size of moving boundary]} \times \text{[rate of motion of boundary]} \\[6pt] = {} & \text{[rate of change of size of bounded region].} \end{align}

There are $n$ moving boundaries, each of size $x^{n-1}$. The rate at which each moves is the rate at which $x$ changes. Hence the rate of change of $x^n$ is $nx^{n-1}$ times the rate of change of $x$.

PS: The boundary rule can be used to prove the product rule if you use a rectangle rather than a square. The two moving boundaries have lengths $f$ and $g$; the rate of motion of each is the rate of change of the other.

PPS: The fundamental theorem also follows from the boundary rule (as noted in comments below). $$ A(x)=\int_a^x f(t)\,dx. $$ The size of the boundary is $f(x)$; the rate at which the boundary moves is the rate at which $x$ changes; hence $\dfrac{dA(x)}{dx}=f(x)$.


If you think about $x^n$ as the volume, in $n$ dimensions, of a cube of side $n$, you can ask "how does that grow when $x$ increases?" Answer: count the number of sides of dimension $n-1$. For a square in the plane, with one corner fixed at the origin, you have the change in area being produced by the upper and right edges, each multiplied by a thickness $\Delta x$, when you change $x$ to $x + \Delta x$. For a cube in 3-space, you have three sides, each of whose areas is multiplied by $\Delta x$ to get the change in volume. For a segment in $\mathbb R^1$, you have the right hand point moving through a distance $\Delta x$ to get the change in length, and so on.

In general, there are $n$ "sides" of a hypercube in dimension $n$ (with one corner fixed at the origin), corresponding to incrementing each coordinate individually. (This is also where the $n$ in the binomial expansion of $(x + \Delta x)^n$ comes from, of course.)


Arguing with infinitesimals and using Newton's binomial: $$\frac {1}{h}((x+h)^n-x^n)=\frac{1}{h}(x^n+nhx^{n-1}+h^2(\ldots )-x^n)= nx^{n-1} + h(\dots ).$$