What does $\frac{dy}{du}$ mean?

This is a very problematic notation, I'll first talk informally and then give you the right way to do things. The point is that it has to do with change of variables. For example, if you have $y=(ax+b)^2$ then if we set $u = ax+b$ we would have $y=u^2$. Thus, if we change in $x$, there is a change ocurring in $u$ and this will give you a change in $y$ that you can calculate with the chain rule.

In the usual notation for derivatives, the chain rule would be stated as:

$$\frac{dy}{dx}=\frac{dy}{du} \frac{du}{dx}.$$

Now, this notation is confusing, this notation is bad, and although everyone should know it to understand when read books and articles using it, people should really move to the modern notation. I'll explain why: it all has to do with the notion of composition of functions.

If we have two functions $f : \mathbb{R} \to \mathbb{R}$ and $g : \mathbb{R} \to \mathbb{R}$ one can form the composition $g \circ f : \mathbb{R} \to \mathbb{R}$ that is defined by $(g\circ f)(x)=g(f(x))$ so, the composition is the result of applying $g$ to the result of applying $f$ to $x$.

In this notation, the chain rule is written as $(g\circ f)'(x)=g'(f(x))f'(x)$ and this notation is much better because it doesn't carry any ambiguities. It says "to take derivative of a composition, take the derivative of $g$ and of $f$ normally, then apply the derivative of $g$ at $f(x)$ and multiply by the derivative of $f$ at $x$".

The example I gave you would have $g(x)=x^2$, $f(x)=ax+b$ and so

$$(g\circ f)(x)=g(f(x))=(ax+b)^2$$

To differentiate, we have $g'(x) = 2x$ and $f'(x)=a$ so $g'(f(x))=2(ax+b)$ and then:

$$(g\circ f)'(x)=2a(ax+b)$$

The usual notation carries ambiguities. First, notice that the function being defined doesn't depend on the letter used: the letter is just a symbol! So, writing $f(x)=x^2$ or $f(u)=u^2$ is the exact same thing, $x$ and $u$ are just placeholders for real numbers.

In the usual notation, the left hand side talks about the derivative of $y=(ax+b)^2$ and the right hand side about the derivative of $y=u^2$. So, $y$ is representing two different functions and this confuses a lot of people.

All of this is really confusing, and the rigorous framework exists because we don't want ambiguities: learning something rigorous can seem a little harder, but you will be able to understand without ambiguities like that. In that case, my suggestion is this : get the book Calculus by Michael Spivak, it will teach you how to think about Calculus in a logical way, running away from this kind of confusion.

I hope this helps you somehow. Good luck!

EDIT: The notation $f: \mathbb{R} \to \mathbb{R}$ just means that $f$ is a function with domain $\mathbb{R}$ and codomain $\mathbb{R}$, in other words, $f$ maps real numbers into real numbers. In general, given sets $A$ and $B$ one function that takes elements of $A$ into elements of $B$ is written $f : A \to B$.


As you put it: ${dy\over dx}$ is "the rate of change of $y$ with respect to $x$". But if $y=y(x)$ is a function of $x$, then $y$ is also a function of $x^3$.
For example if $y=x^6$, then as a function of $x$ it is $x\mapsto x^6$, but as a function of $u=x^3$ it is $u\mapsto u^2$.
Sometimes we want to know how the function changes if we vary not $x$, but some invertible (+differentiable) transformation of $x$. E.g. if we want to know the rate of change of $y$ with respect to $x^3$, we have to compute ${dy\over du}$ rather than ${dy\over dx}$.


If $dy/dx=6$ when $x=50$ and $y=17$, that means $y$ is changing $6$ times as fast as $x$ is changing at that point.

Leibniz in the 17th century and Euler in the 18th century thought of it like this: $dy$ is the infinitely small change in $y$ corresponing to and infinitely small change $dx$ in $x$. The reason they have to be infinitely small is that the rates of change may be changing.

Leibniz is the one who introduced this notation in the first place.

If we were comfortable treating $x$ and $y$ as actual numbers, then we could say right away that when $x=50$ and $y=17$, then $dx/dy=1/6$. But mathematicians have shied away from that approach because of logical difficulties, and in the 19th century invented more cumbersome ways to showed that if $dy/dx=6$ when $x=50$ and $y=17$, then $dx/dy=1/6$ at that point. And those more cumbersome ways are a proof of the chain rule. One way of viewing the chain rule is to say that $\dfrac{dy}{dy}\cdot\dfrac{du}{dx} =\dfrac{dy}{dx}$, so if $y$ is changing $2$ times as fast as $u$ and $u$ is changing $3$ times as fast as $x$, then $y$ is changing $6$ times as fast as $x$. Again if we treat $dy$, $du$, and $dx$ as actual numbers, then this is merely a matter of canceling $du$. But modern proofs of the chain rule are more involved.

Tags:

Calculus