How to treat differentials and infinitesimals?
There is an old tradition, going back all the way to Leibniz himself and carried on a lot in physics departments, to think of differentials intuitively as "infinitesimal numbers". Through the course of history, big minds have criticized Leibniz for this (for instance the otherwise great Bertrand Russell in Chapter XXXI of "A History of Western Philosophy" (1945)) as being informal and unscientific.
But then something profound happened: William Lawvere, one of the most profound thinkers of the foundations of mathematics and of physics, taught the world about topos theory and in there about "synthetic differential geometry". Among other things, this is a fully rigorous mathematical context in which the old intuition of Leibniz and the intuition of plenty of naive physicists finds a full formal justification. In Synthetic differential geometry those differentials explicitly ("synthetically") exist as infinitesimal elements of the real line.
A basic exposition of how this works is on the nLab at
- differentiation -- Exposition of differentiation via infinitesimals
Notice that this is not just a big machine to produce something you already know, as some will inevitably hasten to think. On the contrary, this leads the way to the more sophisticated places of modern physics. Namely the "derived" or "higher geometric" version of synthetic differential geometry includes modern D-geometry which is at the heart for instance of modern topics such as BV-BRST formalism (see e.g. Paugam's survey) for the quantization of gauge theories, or for instance geometric Langlands correspondence, hence S-duality in string theory.
(I'm addressing this from the point of view of standard analysis)
I don't think you will have a satisfactory understanding of this until you go to multivariable calculus, because in calculus 2 it's easy to think that $\frac{d}{dx}$ is all you need and that there's no need for $\frac{\partial}{\partial x}$ (This is false and it has to do with why in general derivatives do not always behave like fractions). So that's one reason why differentials are not like numbers. There are some ways that differentials are like numbers, however.
I think the most fundamental bit is that if you're told that $f dx=dy$, this means that $y$ can be approximated as $y(x)=y(x_0)+f\cdot(x-x_0)+O((x-x_0)^2)$ close to the point $x_0$ (this raises another issue*). Since this first order term is really all that matters after one applies the limiting procedures of calculus, this gives an argument for why such inappropriate treatment of differentials is allowable - higher order terms don't matter. This is a consequence of Taylor's theorem, and it is what allows your physics teacher to treat differentials as very small numbers, because $x-x_0$ is like your "dx" and it IS a real number. What allows you to do things you can't do with a single real number is that that formula for $y(x)$ holds for all $x$, not just some x. This lets you apply all the complicated tricks of analysis.
If I get particularly annoyed at improper treatment of differentials and I see someone working through an example where they write, "Now we take the differential of $x^2+x$ giving us $(2x+1)dx$", I may imagine $dx$ being a standard real number, and that there's a little $+O(dx^2)$ tacked off to the side.
Your math teacher might argue, "You don't know enough about those theorems to apply them properly, so that's why you can't think of differentials as similar to numbers", while your physics teacher might argue, "The intuition is the really important bit, and you'd have to learn complicated math to see it as $O(dx^2)$. Better to focus on the intuition."
I hope I cleared things up instead of making them seem more complicated.
*(The O notation is another can of worms and can also be used improperly. Using the linked notation I am saying "$y(x)-y(x_0)-f\cdot(x-x_0)=O((x-x_0)^2)$ as $x\to x_0$". Note that one could see this as working against my argument - It's meaningless to say "one value of $x$ satisfies this equation", so when written in this form (which your physics prof. might find more obtuse and your math prof. might find more meaningful) it's less of an equation and more of a logical statement.)
See also: https://mathoverflow.net/questions/25054/different-ways-of-thinking-about-the-derivative
I think your math teacher is right. One way to see that differentials are not normal numbers is to look at their relation to so called 1-forms. I do not know if you already have had forms in calculus 2, but it is easy to look up on the internet.
Since you chose a tag "integrals" in your question, let me give you an example based on an integral. Let's say you have a function $f(x^2+y^2)$ and want to integrate it over some area $A$:
$$\int_A f(x^2+y^2) \, dx \, dy$$
The important thing to realize here is, that the $dx\,dy$ is actually just an abbreviation for $dx\wedge dy$. This $\wedge$ thingy is an operation (wedge product - much like multiplication, but with slightly different rules) that can combine forms (in this case it combines two $1$-forms to a $2$-form). One important rule for wedge products is anti-commutation:
$$dx\wedge dy=-dy\wedge dx$$
This makes sure that $dx\wedge dx=0$ (where a physicist could cheat by saying that he neglects everything of order $O(dx^2)$, but that is like mixing pears and apples, frankly misleading). Why would differentials in integrals behave like this and where is the physical meaning? Well, here you can think about the 'handedness' of a coordinate system. For instance the integration measure $dx\wedge dy\wedge dz$ is cartesian 'right-handed'. You can make it 'left-handed' by commuting the $dx$ with $dy$ to obtain $-dy\wedge dx\wedge dz$, but then the minus sign appears in front, which makes sure that your integration in a 'left-handed' coordinate system still gives you the same result as the initial 'right-handed' one.
In any case, to come back to the above integral example, let's say you like polar coordinates better to perform your integration. So you do the following substitution (assuming you already know how to take total differentials):
$$x = r \cos \phi~~~,~~~dx = dr \cos \phi - d\phi\, r \sin \phi$$ $$y = r \sin \phi~~~,~~~dy = dr \sin \phi + d\phi\, r \cos \phi$$
Multiplying out your $dx\wedge dy$ you find what you probably already know and expect:
$$dx\wedge dy = (dr \cos \phi - d\phi\, r \sin \phi)\wedge(dr \sin \phi + d\phi\, r \cos \phi)$$ $$ = \underbrace{dr\wedge dr}_{=0} \sin \phi\cos \phi + dr\wedge d\phi\, r \cos^2 \phi - d\phi\wedge dr\, r \sin^2 \phi - \underbrace{d\phi\wedge d\phi}_{=0}\, r^2 \cos \phi \sin \phi $$ $$=r(dr\wedge d\phi \cos^2 \phi - d\phi\wedge dr \sin^2 \phi)$$ $$=r(dr\wedge d\phi \cos^2 \phi + dr\wedge d\phi \sin^2 \phi)$$ $$=r\, dr\wedge d\phi ( \cos^2 \phi + \sin^2 \phi)$$ $$=r\, dr\wedge d\phi $$
With this the integral above expressed in polar coordinates will correctly read:
$$\int_A f(r^2)r\, dr \, d\phi$$
Where we suppressed the wedge product here. It is important to realize, that if we would not have treated the differentials as 1-forms here, the transformation of the integration measure $dx \, dy$ into the one involving $dr$ and $d\phi$ would not have worked out properly!
Hope this example was down to earth enough and provides some feeling for how differentials are not entirely very small numbers.