When can we not treat differentials as fractions? And when is it perfectly OK?
I'll just make two extended comments.
First, if you'd like to treat $dy/dx$ as a fraction, then you need to do two things:
- (1) Have a clear, precise mathematical definition of what $dy$ and $dx$ are, and
- (2) Have a way of dividing the quantities $dy$ and $dx$.
There are a few ways of answering (1), but the most common answer among mathematicians -- that is, to the question of "what are $dy$ and $dx$ really?" -- is somewhat technical: $dy$ and $dx$ are "differential forms," which are objects more advanced than a typical calculus course allows.
More problematic, though, is (2): differential forms are not things which can be divided. You might protest that surely every mathematical object you can think of can be added, subtracted, multiplied, and divided, but of course that's not true: you cannot (for example) divide a square by a triangle, or $\sqrt{2}$ by an integral sign $\int$.
Second, every single instance in which expressions like $dy/dx$ are treated like fractions -- like, as you say, $u$-substition and related rates -- are just the chain rule or the linearity of derivatives (i.e., $(f+g)' = f' + g'$ and $(cf)' = cf'$). Every single instance.
So, yes, $dy/dx$ can be treated like a fraction in the sense (and to the extent) that the Chain Rule $dy/dx = (dy/du)(du/dx)$ is a thing that is true, but that's essentially as far as the fraction analogy goes. (In fact, in multivariable calculus, pushing the fraction analogy too far can lead to real issues, but let's not get into this.)
Edit: On the OP's request, here are examples of fraction-like manipulations which are not valid: $$\left( \frac{dy}{dx} \right)^2 = \frac{(dy)^2}{(dx)^2} \ \ \text{ or } \ \ 2^{dy/dx} = \sqrt[dx]{2^{dy}}.$$ Because these manipulations are nonsensical, students are often warned not to treat derivatives like fractions.
Suppose $\Delta x$ is a tiny (but finite and nonzero) real number and $\Delta f$ is the amount that a function $f$ changes when its input changes from $x$ to $x + \Delta x$. Then, it's not true that $\Delta f = f'(x) \Delta x$ (with exact equality), but it is true that $\Delta f \approx f'(x) \Delta x$. You are free to manipulate $\Delta x$ and $\Delta f$ however you like, just as you would with any real numbers, so long as you rememember that the equations you derive are only approximately true. You can hope that "in the limit" you will obtain exactly true equations (as long as you are careful).
For example, suppose that $f(x) = g(h(x))$. Then \begin{align} f(x + \Delta x) &= g(h(x+\Delta x)) \\ &\approx g(h(x) + h'(x) \Delta x) \\ &\approx g(h(x)) + g'(h(x)) h'(x) \Delta x, \end{align} which tells us that \begin{equation} \frac{f(x+\Delta x) - f(x)}{\Delta x} \approx g'(h(x)) h'(x). \end{equation} And it certainly seems plausible that if we take the limit as $\Delta x$ approaches $0$ we will obtain exact equality: \begin{equation} f'(x) = g'(h(x)) h'(x). \end{equation}
These kinds of arguments, introducing tiny changes in $x$ and making linear approximations using the derivative, are the essential intuition behind calculus.
Often, arguments like this can be made into rigorous proofs just by keeping track of the errors in the approximations and bounding them somehow.
First, $dx$ and $dy$ are in fact differential forms: things that given a point and a vector with this point as origin gives us some value, linear and antisymmetric in the vector argument, continuous / differentiable / smooth in the point argument.
Now, by Newton-Leibniz, any differential form on $\mathbb{R}$ is of the form $dy = f(x)dx$, where $dx$ is a differential form such that $dx(x, h) = h$ (here, $h$ is a one-dimensional vector - you can treat it as a displacement of $x$).
So, we can try to define division like $\frac{dy}{dx} = \frac{f(x)dx}{dx} = f(x)$. While it works for now, it fails in higher dimensions.
Suppose that we are on a plane, having two basis differential forms: $dx_1$ and $dx_2$ ($dx_i$ is just projection on the $i$-th coordinate). Again, any differential form is $dy = f_1(x)dx_1 + f_2(x)dx_2$. Divide by $dx_1$: $\frac{dy}{dx_1} = f_1(x) + f_2(x)\frac{dx_2}{dx_1}$. We could say here that $\frac{dx_2}{dx_1}$ is zero, since the components of a vector are independent, but let's actually do the division. Let $h = (h_1, h_2)$ be the displacement vector, then $\frac{dx_2}{dx_1}(x,h) = \frac{h_2}{h_1}$. Wow, this is surely not equal to zero, but measures some kind of relative displacement in coordinates. The point is that it depends on $h$ now, and the result of the division cannot be just a function of $x$.
What one really wants here is a some kind of dot product, since, for example, dot product with basis vector gives the corresponding coordinate. Here, this "dot product" arises naturally: take a form $dy$, and plug the basis vector in it: $dy(x, e_1) = f_1(x)dx_1(e_1)+f_2(x)dx_2(e_1) = f_1(x)$ (since $dx_1(e_1) = 1$ and $dx_2(e_1) = 0$). Why $e_1$? It is a vector field dual to the form $dx_1$, that's why.
So, although it looks like a fraction, it's actually more a dot product.