What is the rigorous definition of $dy$ and $dx$?

I'm going to get an excursus that is much more complicated than you actually need for your case, where basically the dimension is $1$. However, I think that you need the following to better understand what is going on behind the "mumbo jumbo" formalism of $\operatorname{d}x, \operatorname{d}y$ and so on.

Get a linear space $V$ of dimension $n\in\mathbb{N}$ and a base $\{e_1,...,e_n\}$. You know from the linear algebra course that there exists a unique (dual) base $\{\varphi_1,...,\varphi_n\}$ of the dual space $V'$ such that: $$\forall i,j\in\{1,...,n\}, \varphi_i(e_j)=\delta_{i,j}.$$ Get back in $\mathbb{R}^n$ and let $\{e_1,...,e_n\}$ be its standard base. Then you define $\{\operatorname{d}x_1,...,\operatorname{d}x_n\}$ as the dual base of $\{e_1,...,e_n\}$.

Then you need the concept of the differential of a function: if $\Omega$ is an open subset of $\mathbb{R}^n$ and $f :\Omega\rightarrow\mathbb{R}$ and $x\in\Omega$, you will say that $f$ is differentiable in $x$ if there exists a linear map $L:\mathbb{R}^n\rightarrow \mathbb{R}$ such that $$f(y)=f(x)+L(y-x)+o(\|y-x\|_2), $$ for $y\rightarrow x$, where $\|\|_2$ is the Euclidean norm in $\mathbb{R}^n$. Also, you will say that $f$ is differentiable if it is differentiable in $x$ for each $x\in\Omega$.

You can prove that if $f$ is differentiable, then for each $x\in\Omega$ the linear map $L$ is unique, in the sense that if $M$ is another linear map that do the same job, then $M=L$. So you are in position to define the differential of $f$ in $x$ as the linear map $L$. In general, when you change the point $x$, also the differential of $f$ in $x$ changes, so you define a map: $$\operatorname{d}f: \Omega\rightarrow (\mathbb{R}^n)'$$ that at each $x\in\Omega$ associates the differential of $f$ in $x$. This map is called the differential of $f$.

Now, fix a differentiable $f :\Omega \rightarrow \mathbb{R}$. Then $\forall x\in\Omega, \operatorname{d}f(x)\in (\mathbb{R}^n)'$ and so, being $\{\operatorname{d}x_1,...,\operatorname{d}x_n\}$ a base for $(\mathbb{R}^n)'$, there exist $a_1:\Omega\rightarrow\mathbb{R},..., a_n:\Omega\rightarrow\mathbb{R}$ such that: $$\forall x \in \Omega, \operatorname{d}f(x)=a_1(x)\operatorname{d}x_1+...+a_n(x)\operatorname{d}x_n.$$ You can prove that $$\frac{\partial{f}}{\partial{x_1}}=a_1,...,\frac{\partial{f}}{\partial{x_n}}=a_n$$ where $\frac{\partial{f}}{\partial{x_1}},...,\frac{\partial{f}}{\partial{x_n}}$ are the partial derivatives of $f$. So, you have: $$\forall x \in \Omega, \operatorname{d}f(x)= \frac{\partial{f}}{\partial{x_1}}(x)\operatorname{d}x_1+...+\frac{\partial{f}}{\partial{x_n}}(x)\operatorname{d}x_n.$$

Now, you define a differential form to be any function: $$F :\Omega \rightarrow (\mathbb{R}^n)'$$ so, in particular, the differential of a differentiable map is a differential form.

You will learn during the course that you can integrate continuous differential form along $C^1$ curves. Precisely, if $\gamma :[a,b] \rightarrow \Omega$ is a $C^1$ function and $F :\Omega \rightarrow(\mathbb{R}^n)'$ is a differential form, then you define: $$\int_\gamma F := \int_a ^ b F(\gamma(t))(\gamma'(t))\operatorname{d}t,$$ where the right hand side is a Riemann integral (remember that $F(\gamma(t))\in(\mathbb{R}^n)'$ and that $\gamma'(t)\in\mathbb{R}^n$, so $F(\gamma(t))(\gamma'(t))\in\mathbb{R}$).

Now, it can be proved that if $f$ is a differentiable function whose differential is continuous, then: $$\int_\gamma\operatorname{d}f = f(\gamma(b))-f(\gamma(a)).$$

Finally, we come back to earth. In your case, you have that $n=1$. So let's interpret the equation $$\frac{\operatorname{d}y}{\operatorname{d}x} = f(x,y)$$ in the context of differential formalism developed above:

  1. $\{\operatorname{d}x\}$ is the dual base in $(\mathbb{R})'$ of the base $\{1\}$ in $\mathbb{R}$;
  2. $y$ is a function, say from an open interval $I\subset\mathbb{R}$, i.e. $y:I\rightarrow\mathbb{R}$;
  3. $\operatorname{d}y$ is the differential of the function $y$, and then $\operatorname{d}y : I \rightarrow (\mathbb{R})'$;
  4. Then, as we stated before (see the section about partial derivatives), it holds that the derivative of $y$, i.e. $y'$, satisfies $\forall x\in I, \operatorname{d}y(x) = y'(x)\operatorname{d}x$. Here, the expression $\frac{\operatorname{d}y}{\operatorname{d}x}$ is just a name for $y'$, so, keeping that in mind, $\forall x\in I, \operatorname{d}y(x) = \frac{\operatorname{d}y}{\operatorname{d}x}(x)\operatorname{d}x$;
  5. $f : I\times \mathbb{R}\rightarrow \mathbb{R}$ is a function, and we want that $\forall x \in I, \frac{\operatorname{d}y}{\operatorname{d}x}(x) \doteq y'(x) = f(x,y(x))$;
  6. So you want that $\forall x \in I, \operatorname{d}y(x) \overset{(4)}{=} \frac{\operatorname{d}y}{\operatorname{d}x}(x)\operatorname{d}x \overset{(5)}{=} f(x,y(x)) \operatorname{d}x$ (notice that this is an equation in $(\mathbb{R})'$);
  7. Now, get an interval $[a,b]\subset I$ and integrate the differential form along the curve $\gamma :[a,b]\rightarrow I, t\mapsto t$. On one hand you get: $$\int_\gamma \operatorname{d}y = \int_a ^b \operatorname{d}y(\gamma(t))(\gamma'(t))\operatorname{d}t = \int_a ^b y'(t)\operatorname{d}t = y(b)-y(a),$$ and on the other hand: $$\int_\gamma \operatorname{d}y = \int_\gamma (x\mapsto f(x,y(x)))\operatorname{d}x = \int_a ^b f\left(\gamma(t),y(\gamma(t))\right)\operatorname{d}x(\gamma' (t))\operatorname{d}t = \int_a ^b f(t,y(t))\operatorname{d}t,$$ and so: $$y(b)-y(a) = \int_a ^b f(t,y(t))\operatorname{d}t.$$

Let me start with a preface that, to really get into the "true" rigorous definitions of $\text dx$ and $\text dy$, one needs to have multivariate calculus and linear algebra as a prerequisite, and should study "differential geometry", which is the mathematical framework that uses these objects in a rigorous manner. In calculus, we "borrow" their properties and notations without using their formal meaning. In fact, calculus doesn't even require a formal definition of what those objects are, and can be framed in a manner that doesn't use them (using prime notations for derivatives and domain restrictions for integrals). So, if you feel like you're missing some logical steps needed to build calculus from the axioms, you are not. Also worth noting is that there is a mathematical framework called "non-standard analysis" which formalizes $\text dx$ and $\text dy$ as "infinitesimals", but this is not generally how people define them. With that out of the way, I'm going to attempt to tackle this question without getting in the nitty gritty of it, but I'll point you in the direction of some sources that will do that for you.

In single variable calculus, we look at functions $f: U\to \mathbb{R}$ where $U\subset\mathbb{R}$. That is, our functions take in a single real number and spit out a single real number. We define the way this mapping happens as $f: x\mapsto y$ or simply $y = f(x)$. We can define the derivative of $f$ as the function $f': D\to \mathbb{R}$, with $f': x\mapsto \lim\limits_{h\rightarrow 0} \frac{f(x+h)-f(x)}{h}$, where $D\subset U$ is the set of $x$ where such a limit exists and is finite. This should all be familiar to you, as it is simply calculus as you know it.

We define a function $\text dx: \mathbb{R}\to\mathbb{R}$, with $\text dx: h\mapsto h$. Yes, that's just the identity function; intuitively, $\text dx$ measures how much $x$ changes with respect to changes in $x$, so it makes sense. Then, since $y = f(x)$, we define $\text dy(h;x) = f'(x)\text dx(h)$. Just like $\text dx$, this is a function of $h$, but it also depends on $x$ -- where you take the derivative at. For a given $x$, this is just a linear function through the origin, in the form $\text dy = a\cdot h$ for some real number $a$. This is consistent with our definition for $\text dx$ when we take $y = f(x) = x$, since $f'(x)\equiv 1$, so $\text dy(h;x) = \text dx(h)$. In general, we say for a function $f$, there is a function $\text df$ defined in this way, called the differential of $f$. Then, we can define the antiderivative in a very natural way, $\int df = f$. In this way, one way to describe a differential is as something that's meant to be integrated. If we look at the intermediate step, we see $\int df(x) = \int f'(x)dx = f(x)$ which shows us the common notation we use in calculus, and also shows us how $u$-substitution works. Another thing we can do with this notation is go back and solve for $f'(x)$ in terms of the differentials, which gives us $\frac{\text dy(h;x)}{\text dx(h)} = f'(x)$. Since the RHS is independent of $h$, we drop it on the LHS and simply write $\frac{\text dy}{\text dx} (x) = f'(x)$. And, since we're lazy, we often times even just write $\frac{\text dy}{\text dx}$. This is a fraction, which justifies using standard techniques to separate the variables in ODE, but if you aren't careful about keeping that $h$ dependence in mind when moving around, you might lose something important in the process.

A lot of this seems kind of obvious in the case of single variable calculus, or even pointless in some sense, but it makes a little more sense when we move into higher dimensions. If $f: U \to \mathbb{R}^m$ where $U\subset \mathbb{R}^n$, we can still define all those concepts, but with a little more work. First, we name the $n$ input variables $x_1,x_2,\dots,x_n$, and similarly for the output $y$'s. Then, we can define $\text dx_1: \mathbb{R}^n \to \mathbb{R}$ as $\text dx_1(h) = h_1$, where $h_1$ is the first component of the vector $h$. Of course, this same definition holds for each dimension, so $\text dx_i(h) = h_i$ for all $i = 1,\dots,n$. Now, instead of being the identity function, $\text dx_i$ is the $i$-th component function. Just like before, $dx_i$ measures how much $x_i$ changes with respect to changes in $x$ (all dimensions), so it's just the part of $x$ that moves in the $x_i$ direction. This gives us a way to isolate change of a function with respect to changes in each of the input directions. We can combine these all into one function $\text dx(h) = h$, whose $i$-th component is $\text dx_i$. We define the derivative of the $i$-th output direction with respect to the $j$-th input direction as $\frac{\partial y_i}{\partial x_j}(x) = \lim\limits_{h\to 0} \frac{f_i(x+h|_j)-f_i(x)}{h|_j}$, where $h|_j$ is the vector $h$ with all the components set to $0$ besides the $j$-th one. Then, we define $\text dy_i(h;x) = \sum\limits_{j=1}^n \frac{\partial f_i}{\partial x_j}(x) \text dx_j(h) = \text D f_i \cdot \text dx(h)$, where $\text Df_i$ is the vector whose $j$-th component is $\frac{\partial f_i}{\partial x_j}(x)$ and $\cdot$ is the dot product. Finally, define $\text dy(h;x)$ to be the vector whose $i$-th component is $\text dy_i(h;x)$. In other words, $\text dy(h;x) = \text Df(x) \text dx(h)$, where $\text Df$ is a matrix whose $i,j$-th entry is $\frac{\partial f_i}{\partial x_j}$, called the Jacobian. Then, since $\text Df$ is a constant matrix with respect to $h$ and $\text dx(h) = h$, then $\text dy(h;x) = Ah$ for a matrix $A$, which is a linear function in $h$. So, $\text dy$ just tells how how much $y$ changes in each direction with respect to changes in $x$, which $\text Df$ tells us how much $y$ changes in each direction with respect to changes in each individual direction of $x$. Then, again, we can define the antiderivative of any differential as $\int \text df = f$. However, the conditions on a general function $g(x)$ being able to written as $\text df(x)$ for some function $f$ are much more strict than they were in one dimension. This is linked to the fact that classic "separation of variables" that works for one-dimensional ODEs does not work for PDEs or higher dimensional ODEs in general, but there are specific, rarer cases where we can take advantage of the differential of the function.

This wraps up a very "calculus" way of approach the idea of differentials. These ideas are the same as what you will find in differential geometry however, just watered down a bit to be more digestible. If you want a more rigorous formalization that is built from calculus, check out Spivak's "Calculus on Manifolds". If you want to jump right into the differential geometry and basically start at defining differentials, I recommend Do Carmo's "Differential Forms and Applications". To sate your desire for a more rigorous approach to ODEs, check out Grimshaw's "Nonlinear Ordinary Differential Equations" (don't let the title fool you, he spends a decent amount of time on the theory of linear equations as well). All of these text will require you to have a firm understanding of linear algebra (as in abstract vector spaces, not just matrix manipulation), as their entire purpose is generalizing concepts to higher dimensions, specifically $\mathbb{R}^d$, which holds linear algebra at it's core. Spivak is what I would consider an undergraduate text, in that it's really approachable with little effort by anyone who has a solid grounding in linear algebra and multivariate calculus, as well as some experience with proof-based reasoning. Grimshaw is the next step up, and I would put it at "advanced" undergraduate level, while Do Carmo is firmly in the graduate realm. It is technically approachable by undergraduates, but it's not written with limited knowledge of mathematics in mind.

Hopefully you can see that differentials are an interesting way to look at things, but also that you don't really need them to discuss calculus or ODEs at an elementary level, and tend to be more trouble than they're worth to introduce them for most students. However, I agree that it can be confusing that we use notation that suggests differential forms without formally defining them, but it's more of a historical holdover than anything else. You'll see in Grimshaw's text that there isn't really any discussion of differential forms or differentials as a thing at all, because it's not really necessary. For example, the equation $\frac{\text dy}{\text dx} = \frac{f(x)}{g(y)} \Leftrightarrow g(y)\frac{\text dy}{\text dx} = f(x)$ can be antidifferentiated via substitution (or inspection). Also, there is no explicit requirement for $f$ or $g$ to be integrable, Reimann or otherwise, but we often require $y$ to be continuously differentiable (or at least differentiable) on some domain of interest, which will imply that $f$ and $g$ are integrable. If you are interested in the intersection of differential geometry and differential equations, you can look into, for example, Hairer's "Solving Differential Equations on Manifolds" and other similar texts.


The existence of solitary $dx$'s and $dy$'s in ODE theory is a historical artifact. ODE theory was developed several centuries ago, at a time when calculus was still defined in terms of manipulations with undefined objects called variously differentials/infinitesimals/fluxions. The popular favor of these objects has waxed and waned over time. They fell into disrepute around the time Weierstrass introduced the rigorous $\epsilon-\delta$ definition of a limit, but later came back in the 1960's with the discovery that they could be made fully rigorous using something called non-standard analysis.

Arguments using solitary $dx$'s arise from considering them as infinitesimals, and in my opinion that is the best way to define them rigorously. You can sort of patch up the logic behind $dx$'s by way of things like differential forms, but it still does not allow you to truly do all the manipulations you find in ODE books (like dividing one differential by another).

My perspective on this matter is born out of my own frustration with trying to find answers to questions just like the OP. People have been trying to put rigor behind infinitesimal arguments in many ways and for many decades, but in my experience there is no satisfactory solution save non-standard analysis. So my recommendation is to check out this field. "Nonstandard analysis: Theory and applications" by Arkeryd et al. is my favorite reference (best to get it through a university library, as it has a hefty price tag), and has a brief section on ODE's. I've heard "Nonstandard analysis for the working mathematician" by Loeb and Wolff is good too.