Definition of a Differential Equation?
"When I use a word," Humpty Dumpty said, in a rather a scornful tone, "it means just what I choose it to mean—neither more nor less."
I think Arnol'd is correct, but I think he is being unnecessarily confrontational about it. All the books on your list that I am familiar with nearly immediately jump to a more precise formulation that a differential equation is one of the two following things: \[ y^{(n)}(t) = F(t, y(t), y'(t), \dots, y^{(n-1)}(t) ), \] or \[ G(t, y(t), \dots, y^{(n)}(t)) = 0. \]
Here is another example of an equation that I would not want to call a differential equation: \[ y'(t) = y(t-1). \] This meets the heuristic definition, but fails to be of the form I specified above (or of the form Arnol'd considers).
I now see that Qiaochu has written nearly the same thing above.
btw, I think Arnold's book is fantastic, but should be complemented with a more standard treatment of ODE, if only so that you know what everyone else knows in addition to the topics Arnold focuses on.
EDIT: To answer the 2nd half of the question, I don't know of any books that are as geometric as Arnold. IMO, the big strength of his book is that he makes the geometric intuition jump out at the reader, and downplays the analytical side of things. This complements the more traditional books that focus on the analytical aspects (and on explicit solutions) and lose all the geometry.
Arnold has another book that is somewhat more advanced, Mathematical Methods of Classical Mechanics. I think it's another great book, though it's hard to read. He also has a book called Geometrical methods in the theory of ODE. This is also a more advanced book, so it is not one you want to look at yet.
A book that I found very compelling was Hirsch and Smale, Differential Equations, Dynamical Systems and Linear Algebra. It's more analytical than Arnold, but is more geometric than most.
EDIT 8 years later: Let me add a recommendation for Strogatz's Nonlinear dynamics and chaos. I think it's a beautiful book and wish I could go back in time and give it to my younger self.
Arnold simply means that most books are not being precise. A slightly more precise version of the first few definitions is that a differential equation (in one variable) is an equation of the form $f(t, x, x', x'', ...) = 0$. This rules out Arnold's example.
When I was a student I was taught the following definition:
Let $N\in \mathbb{N}$, $U\subseteq \mathbb{R}^{N+2}$ and $F:U\to \mathbb{R}$.
Then the $N^{th}$ order ordinary differential equation (in implicit form) corresponding to $F$ is the problem of finding all the non-degenerate intervals $I\subseteq \mathbb{R}$ and all the functions $y:I\to \mathbb{R}$ such that the following hold:
Each $I\subseteq \text{proj}_1 U$ (i.e. $I$ is a subset of the projection of $U$ onto the first coordinate direction);
$\text{proj}_N u\neq 0$ for some $u\in U$ (so that the ODE is actually $N^{th}$ order); and,
$(x,y(x),y^\prime (x), \ldots , y^{(N)}(x))\in U$ and $F(x,y(x),y^\prime (x),\ldots ,y^{(N)}(x))=0$ for each $x\in \text{int }I$.
This problem can be denoted for short as: $$F(x,y,y^\prime, \ldots ,y^{(N)})=0\; .$$
If the function $F$ is of the type: $$F(x,y_0,y_1,\ldots ,y_N)=f(x,y_0,y_1,\ldots ,y_{N-1})-y_N$$ then the differential equation is said to be in normal (or explicit) form and it can be denoted for short as: $$y^{(N)}=f(x,y,y^\prime ,\ldots,y^{(N-1)})\; .$$
What do you think about it?