Why it is absolutely mistaken to cancel out differentials?
Physicists might use infinitesimals like in the following derivation of the product rule:
$$\begin{align} f(x+dx)g(x+dx) &= \left(f(x)+ f^\prime(x)dx\right) \cdot \left(g(x)+g^\prime(x)dx\right) \\ &= f(x)g(x)+ f^\prime(x)g(x) dx + f(x)g^\prime(x) dx + f^\prime(x)g^\prime(x) \underbrace{dx^2}_{=0} \\ & = f(x)g(x)+ (f^\prime(x)g(x) + f(x)g^\prime(x)) dx \end{align}$$
The problem is: What is $dx$? Here you will get normally the answer that $dx$ is an infinitesimal, i.e. a number not being zero and with a distance from zero smaller than every rational number $q\in\mathbb Q^+$. But there are some problems with this explanation:
- $dx$ is mostly used like an ordinary real number. One builds fractions like $\tfrac{dy}{dx}$ and calculates with those objects like they would be real fractions. But if you think how $dx$ is used, it would be a strange number. When I write $dx^2=0$ I use $dx$ like it would be zero. One the other side $dx$ might occur in the denominator of a fraction, which is only allowed for $dx\neq 0$. So sometimes $dx$ behaves like 0 and sometimes like a nonzero number.
- Due to the Archimedian property there is no real number satisfying the properties of $dx$. So if the number system you use is $\mathbb R$, the object $dx$ cannot be a number. Because the Archimedian property is an axiom of contemporary analysis and this theory has no other concept for infinitesimals, one cannot use $dx$ in nowadays analysis.
- Normally nobody gives a mathematical rigorous definition of $dx$ when it is used in a physics course. So the question remains: What is $dx$?
Nowadays there are mathematical theories for infinitesimals: For example there is non-standard analysis, where the set of real numbers is expanded to the set of hyperreal numbers, which contains infinitesimals. In this theory one can do calculations as in the example above. So it is possible to cancel out differentials, if one shifts the underlying theory from contemporary analysis to something like non-standard analysis.
My Opinion: I do not think, that there is actually a problem. Normally physicists have a good intuition with infinitesimals. They know how they can use them and what problems might occur and they can work effectively with them. Okay, it would be great, if more people would be aware of theories like non-standard analysis, but in my opinion, one first has to learn the intuition of a concept before he can study its rigorous definition. For example you first calculate with real numbers in school and get some intuition for them before you go to university and learn, what the axioms of the real number system are or how they can be constructed via Dedekind cuts or Cauchy sequences.
I agree with the accepted answer: differential notation is a very useful tool for calculations, and in most of the situations where physicists and engineers use it, everything works out fine. That said, I'd like to point out a case where being sloppy with differential notation can lead one to an apparent "proof" of a false statement. I have heard this attributed to Cauchy, although I suspect this attribution is a "mathematical urban legend".
Suppose $f_n$ is a sequence of continuous functions which converges to a function $f$ on $[0,1]$. We ask whether $f$ must be continuous. We write
$$\begin{align} |f(x+dx)-f(x)| & =|f(x+dx)-f_n(x+dx)+f_n(x+dx)-f_n(x)+f_n(x)-f(x)| \\ & \leq |f(x+dx)-f_n(x+dx)|+|f_n(x+dx)-f_n(x)|+|f_n(x)-f(x)| \end{align}$$
Informally we now think about infinitely large $n$ and infinitely small $dx$. Then all three terms should be infinitely small (the first and third because of convergence and the second because of continuity). So the original side should be infinitely small, and so $f$ should be continuous.
When we formalize the above argument, everything works out provided $f_n$ converge uniformly. But if they converge only pointwise, then this fails: $f_n(x) = x^n$ converges for $x \neq 1$ to $0$ and for $x = 1$ to $1$. Note that unlike uniform convergence, pointwise convergence can be reasonably formulated without developing the axiomatic framework of analysis: all that we need is a way to talk about convergence of sequences of numbers, and a notion of continuity.
The problem in the argument when we go to approach the problem formally is that we need to pick a single $n$ to control both the first and third terms, and only after doing so do we choose how small $dx$ must be to control the second term. But that means that we choose $n$ before we have chosen $dx$, and without uniform convergence we may need a larger $n$ to control the first term for our chosen $dx$.
It is hard to even describe this phenomenon in the infinitesimal language! One way to see it is in the hyperreal framework: take $N$ to be an infinite natural number and $dx=1/N$. Then the standard part of $(1-dx)^N$ is not $1$, but rather $e^{-1}$. And now we see the problem: the limit processes $n \to \infty$ and $x \to 1^-$ compete with one another, one trying to pull the result toward $0$ and the other trying to pull the result toward $1$. This effect is missed when we naively say that $n$ is infinite and $dx$ is infinitesimal without saying how they compare to one another.