Interpretation of the field strength tensor in Yang-Mills Theory
Here are several "motivations" to consider the curvature of a gauge field. However, keep in mind that the notion of "physical" motivation is somewhat vague in non-Abelian gauge theories, and that "it coincides with electromagnetism in the Abelian case" is really a rather strong motivation already - of course we want the non-Abelian theory to reduce to electromagnetism in the Abelian case.
We want a gauge invariant object to use in the action. Taking the gauge field $A$ itself or traces of it doesn't work since the additional $g\mathrm{g}^{-1}$ term under gauge transformations spoils invariance of the trace under the $gAg^{-1}$. Since the gauge field was introduced to have a covariant derivative, it seems natural to try to take the covariant derivative of it. Indeed, we find that $\mathrm{d}_A A = \mathrm{d}A + A\wedge A = F$ transforms as $F\mapsto gFg^{-1}$ under gauge transformation, so taking its trace yields a gauge-invariant object we can use to build a gauge-invariant action.
It's the infinitesimal holonomy. This is the Ambrose-Singer theorem: given a connection with its notion of parallel transport, the holonomy around a closed path, often symbolically written as a path-ordered integral $\mathcal{P}\mathrm{e}^{\oint A}$, is another natural and gauge-invariant object to consider. It's physically relevant as the expectation values of Wilson and Polyakov loops are little more than the expectation value of the holonomy along these loops, and if you shrink such a loop, you find that the value of the holonomy becomes well-approximated by the value of the curvature inside the loop. Basically, this is what is "curved" about the curvature - it tells you how much the parallel transport around an infinitesimal loop based at a point deviates from the identity, i.e. "flat" space.
Classically, the gauge field strength is a curvature of a connection, the same way that the Riemann tensor is. Since $F^a_{\mu\nu}$ lives in the adjoint representation of the gauge group, you can define a 4-index object very analogous to the Riemann tensor:
$$\mathcal{F}^a{}_{b\mu\nu} \equiv F^c_{\mu\nu} f_c{}^a{}_b,$$
where $f_c{}^a{}_b$ are the structure constants defined via
$$[t_c, t_b] = f_c{}^a{}_b \, t_a,$$
modulo factors of $i$ if you care to insert them. In any case, the object $\mathcal{F}^a{}_{b\mu\nu}$ contains information about parallel transport around infinitesimal loops, in the same sense that the Riemann tensor does. But the vector being translated is not a tangent vector; instead it is a vector in gauge space (or "internal" space), whose components are defined via expansion in the generators $t_a$, as in $V = V^a \, t_a$.
So, $\mathcal{F}^a{}_{b\mu\nu}$ describes the change in $V = V^a \, t_a$ as it is parallel-transported (via the covariant derivative $D_\mu \equiv \partial_\mu + A_\mu$, again modulo factors of $i$, etc.) around a small parallelogram in the directions $\partial_\mu, \partial_\nu$.