What does it mean when we say 'The difference between two quantities is of first order'?
It means to say that the difference, when $dx$ is infinitesimally small (in the sense of calculus), is proportional to $dx$ to the power 1, $dx^1$.
For example, a Taylor expansion of a function near to a given point $x$, $$f(x + dx) = f(x) +dx f'(x) + \frac{1}{2} dx^2 f''(x) +...$$ Consists of an infinite number of terms that express the function with shifted argument in terms of the function and its derivatives at the point $x$. Each term contains $dx$ to a different power, known as the order of the term.
To zeroth order in $dx$ (no power of dx) the functions are the same (first term on RHS). At first order, $dx^1$ they differ by $dx f'(x)$ (second term on RHS). At quadratic order, or order two, they differ by $\frac{1}{2} dx^2 f''(x)$ etc.
For $dx$ infinitesimal, let's suppose we can loosely say $dx \ll 1$, we will have the term of order $dx$ dominating (Ie. Bigger than) the terms of higher order, since they come with an infinitesimal quantity raised to a higher power.
Say you want a transformation that rotates vectors and you pretend you never heard of trigonometric functions.
Sophus Lie had a trick:
But first there's a surprising trick which works most of the time called Taylor expansion:
Let a real valued function $f$ over some open interval (lets say for now that it contains $0$) where it is differentiable infinitely many times. The expression:
$$g_5(x) \equiv \frac{f(0)(x-0)^0}{0!} + \frac{f'(0)(x-0)^1}{1!} + \frac{f^{(2)}(0)(x-0)^2}{2!} +\frac{f^{(3)}(0)(x-0)^3}{3!} + \frac{f^{(4)}(0)(x-0)^4}{4!} +\frac{f^{(5)}(0)(x-0)^5}{5!}$$
is a sequence that satisifies for each number in the interval: $$f(x)=lim_{n \to \infty}g_n(x)$$
Pretty cool.
Now comes Euler. There's a special number $e=2.71828...$ and it is cool in an unbounded amount of ways. specifically though it has a property:
$$(e^x)'=e^x$$
So using the tyalor trick (and $e^0=1$):
$$e^x=1+ x+\frac{x^2}{2!} + \frac{x^3}{3!} + ....$$
Why stop here? It can be generalized with some extra rules to $x \to X$ where $X$ is a matrix!
Do a rotation by an arbitrarily angle, and express it to first order (J is [[0,-1],[1,0]] and it is algebraically looking like $i$):
$I+\theta J$
This is actually an ordinary rotation but very very close to the identity transformation, but only if the angle is tiny. It's an algebraic feature for all angles and that is why it is part of a Lie algebra. Now Lie's trick is to just use the exponential map using $e$ as the base. This is a map to the Lie group, where it is exact and not only the algebra.
$e^{\theta J}=I+\theta J- \frac{\theta ^2 I}{2!} - \frac{\theta ^3 J}{3!}+\frac{\theta ^4 I}{4!} + \frac{\theta ^5 J}{5!} + ....$
If you arrange the terms with $I$ separate from the terms with $J$ you get exactly a full blown trigonometric rotation!
So the moral of the story is that in a lot of interesting cases you suffice with first order of an operator and apply an exponential map!
Note that it does not have to be small. It is handled algebraically as a "small" operation.
About the meaning of the fruitful hint in book:
It is probably intentional that it is about a $dx$. Pretty sleek. But it can't be serious in the context of modern logic, with the other things that are decided in his elaboration. So don't bother at all with it is the best strategy!
If you do insist then avoid interpreting $dx$ as infinitesimal otherwise it not doable real ly.
First of all $dx$ is a differential. Thinking literally and keeping it simple it is a thing that take differences and gives values. If you tend it towards 0 it will be, well, 0. Makes the whole thing pretty vacant. So disregard the first part it is just annoying.
I'm allowed to say that as an identity operator it is of 0 order, also logically. So as an operator it just states the fact that dx.
So the difference portion tells you that j has to be first order in dx.
Now why?
Firstly because formally $$j[f(x)]=f(x+dx)$$ has a $dx$ in it and if you interpret it normally then it depends on the value of dx, with which you can infer what you have to get from the $f$, and calculate what is it that $j$ is doing. And then go on with your life :).
Sometimes being first order means that higher powers are just $0$. Not a very satisfying way of understanding in my opinion. Because that is just an instance of what it isn't. There are plenty more.
Face value though, what it means is that the difference is $f'dx$.
$$f'dx=(f(x)dx^0 + f'(x)dx) - f(x)$$
So:
$$j[f(x)]=f+f'dx=f(x)+df(x)=(fdx)'$$