What are the difference between modeling with stochastic differential equations (SDE) and ordinary differential equations (ODE) with a random force?
You might want to look into the Wong-Zakai theorem. Essentially, it states that if $\xi_\epsilon$ is a sequence of smooth approximations to white noise, then the solutions to the random ODE $$ {dx \over dt} = f(x) + g(x) \xi_\epsilon\;, $$ converge to the solutions to the SDE (in Stratonovich form) $$ dx = f(x)\,dt + g(x)\circ dW\;. $$ There are of course technical assumptions, but hitting white noise with a smooth enough mollifier satisfies these assumptions, as do piecewise constant approximations (i.e. piecewise linear approximations of the Wiener process).
For reasonably nice coefficients of an SDE, its solution is a Markov process. That is, in fact, how SDE's came about: Ito introduced stochastic integral as a tool to describe diffusion processes. On the other hand, solutions of random ODE's are typically not Markov, because the distribution of the trajectory after time $t$ depends not only on the current value $X(t)$, but also on the current value of $\xi(t)$. Markov vs. non-Markov is one fundamental difference, but there are others. For example, an important side effect is that solutions of ODE's are differentiable whereas solutions of SDE's even fail to have bounded variation.
From a numerical standpoint, although you can interpret SDEs as non-autonomous ODEs, the forcing term is not differentiable (it's "almost" Holder continuous $\alpha=1/2$ a.s.) and thus the error estimates one uses to derive numerical methods for ODEs usually do not apply (for example, Runge-Kutta order $k$ methods assume $k$ derivatives in their derivations). In fact, in just about any case where $g$ is not a constant, the order of convergence for deterministic methods on SODEs ends up being $1/2$, which is clearly sub-optimal. Thus instead the numerical methods for SODEs have to use the properties of the "Ito derivative" and stochastic Taylor series to derive new methods for which the higher orders of the error analyses apply.
However, what you pointed out is where a lot of the intuition comes from. If you read work from researchers like Hairer or Kloden, they frequently build the heuristics for understanding the system by approaching them as non-autonomous dynamical systems where the forcing term has not so nice qualities (i.e. Holder $1/2$ or in the case of space-time white noise, $1/4$). Thus in some sense you can understand things heuristically like "the deterministic methods are only order $1/2$ because the forcing term is Holder continuous $1/2$", and so understanding the SDE as a type of ODE has value in that sense.