Torsion and parallel transport
Here is another way to think of the relation between torsion and parallel transport, one that some may find more congenial than many of the other interpretations that have been proposed:
Start with a manifold $M$, and a connection $\nabla$ on $TM$. Consider the bundle $\hat TM=\mathbb{R}\oplus TM$ (where, by $\mathbb{R}$, I really mean the trivial bundle $M\times\mathbb{R}$). Define a connection $\hat\nabla$ on $\hat TM$ by the rule $$ \hat\nabla_X\ \begin{pmatrix}a \\\\ Y\end{pmatrix} = \begin{pmatrix}da(X) \\\\ aX + \nabla_XY\end{pmatrix} $$ for any function $a$ on $M$ and any vector fields $X$ and $Y$ on $M$.
I like to think of $\hat TM$ as a sort of 'extended' tangent bundle to $M$. Of course, it canonically contains $TM= 0\oplus TM\subset \mathbb{R}\oplus TM$ as a subbundle, and one sees that, under this identification of $\begin{pmatrix}0\\ Y\end{pmatrix}$ with $Y$, one has $\hat\nabla_X Y = \nabla_XY$. Also, by its definition, $\hat\nabla$-parallel translation in $\hat TM$ along a curve in $M$ will keep the $\mathbb{R}$-component of a section constant, so that the 'affine hyperplane subbundles' $a=const$ in $\hat TM$ are preserved under $\hat\nabla$-parallel translation.
Now, the curvature endomorphism of $\hat\nabla$, namely $$ R^{\hat\nabla}(X,Y) = \hat\nabla_X\hat\nabla_Y-\hat\nabla_Y\hat\nabla_X - \hat\nabla_{[X,Y]}, $$ is just $$ R^{\hat\nabla}(X,Y)\begin{pmatrix}a \\\\ Z\end{pmatrix} = \begin{pmatrix}0&0\\\\T^{\nabla}(X,Y) & R^{\nabla}(X,Y)\end{pmatrix} \begin{pmatrix}a \\\\ Z\end{pmatrix}. $$
Thus, the condition that $T^\nabla(X,Y)=0$ is the condition that $\hat\nabla$-parallel translation around closed loops preserve the splitting $\hat TM = \mathbb{R}\oplus TM$, at least to lowest (i.e., second) order. In particular, if one thinks of the hyperplane subbundle $a=1$ as a sort of 'shadow copy' of the actual tangent plane, the torsion measures how its 'zero', i.e., $\begin{pmatrix}1\\0\end{pmatrix}$, moves in that 'shadow tangent plane' when one uses $\hat\nabla$ to parallel translate around in a loop.
In particular, note that the curvature of $\hat\nabla$ incorporates both the torsion and curvature of $\nabla$, and the vanishing of the torsion of $\nabla$ says precisely that $\hat\nabla$-parallel translation preserves, to lowest order, the origins in the 'shadow copies' of the tangent plane.
NB: It is a notational accident that, in this interpretation, $T(X,Y)$ represents 'infinitesimal translation' while $R(X,Y)$ represents 'infinitesimal rotation'.
Remark 1: The addition of a line bundle to the tangent bundle may seem somewhat artificial. However, there is a way to present this construction more naturally in a dual formulation: First, note that the vector bundle $J^1(M,\mathbb{R})$ of $1$-jets of smooth functions on $M$ has a natural identification with the vector bundle $\mathbb{R}\times T^*\!M$ by identifying the element $j^1_xf$ with $(f(x), \mathrm{d}f_x)\in \mathbb{R}\times T^*\!M$. Second, note that a connection $\nabla$ on $TM$ naturally extends to a connection on $T^*\!M$, uniquely defined by the conditions that $X\bigl(\eta(Y)\bigr) = (\nabla_X\eta)(Y) + \eta(\nabla_XY)$ for all $1$-forms $\eta$ and vector fields $X,Y$ on $M$. Using this identification of $J^1(M,\mathbb{R})$ with $\mathbb{R}\times T^*\!M$, we can define a connection $\hat\nabla$ on $J^1(M,\mathbb{R})$ by $$ \hat\nabla_X(a,\alpha) = \bigl(Xa + \alpha(X),\ \nabla_X\alpha\bigr) $$ for all functions $a$, $1$-forms $\alpha$, and vector fields $X$ on $M$. Then one easily computes that $$ R^{\hat\nabla}(X,Y)(a,\alpha) = \bigl(\alpha\bigl(T^\nabla(X,Y)\bigr),\ R^{\nabla}(X,Y)(\alpha)\bigr), $$ which is the dual of the above formula on the 'extended tangent bundle'.
Remark 2: Finally, going back to the OP's original request for how to think about torsion in terms of failure of 'quadrilaterals' to close, I checked on the coordinate expressions and looked at the Taylor series. This is, of course, very classical; it's in Schouten's work, but it might be worth making explicit in the following way: Let $\nabla$ be a connection on $TM$, fix a point $p$ and $p$-centered coordinates $x=(x^i)$. Define the connection coefficients $\Gamma^i_{jk}$ by the usual rule $$ \nabla_{\partial_i}\partial_j = \Gamma^k_{ij}\ \partial_k\ , $$ where $\partial_i$ are the dual vector fields, i.e., $dx^i(\partial_j) = \delta^i_j$.
Choose $v,w\in T_pM$ be tangent vectors and write $v=v^i\ \partial_i(p)$ and $w=w^i\ \partial_i(p)$. Let $a(t)=\exp_p(tv)$ be the $\nabla$-geodesic starting at $p$ with initial velocity $v$, let $b(t)\in T_{a(t)}M$ be the $\nabla$-parallel translate along $a$ of $w$, and set $c(s,t) = \exp_{a(t)}(sb(t))$. Then the functions $c^i= x^i(c(s,t))$ have Taylor expansions in $s$ and $t$ of the form $$ c^i(s,t) = tv^i+sw^i - \tfrac12\Gamma^i_{jk}(0)\bigl(t^2v^jv^k+2st\ v^jw^k+s^2w^jw^k\bigr) + R^i_3(s,t). $$ Thus, if switching $(t,v)$ and $(s,w)$ in this construction is to yield the same result up to second order in $s$ and $t$ for all $v$ and $w$, one must have $\Gamma^i_{jk}(0) = \Gamma^i_{kj}(0)$. In particular, if all the 'attempted parallelograms' close to second order at all points, the torsion of $\nabla$ must vanish.
See this question and its answers.
My view is the following: Suppose that $\nabla$ is a linear connection on a vector bundle $E\to M$, and that there is $\sigma\in \Omega^1(M;E)$, a 1-form on $M$ with values in $E$ such that $\sigma_x:T_xM\to E_x$ is a linear isomorphism. This is called a soldering form. It identifies $E$ with $TM$.
The torsion is then $d^{\nabla}\sigma\in\Omega^2(M;E)$. The usual formula of torsion is then $T(X,Y)=\sigma^{-1}((d^\nabla\sigma)(X,Y))$. Torsion is an obstruction against the soldering form being parallel for $\nabla$. Maybe this explains, that space is twisting along geodesics if the torsion is non-zero. So torsion can be viewed either as a property of the soldering form (choose it better if you want to get rid of torsion), or as a property of $\nabla$ (if you identify $TM$ with $E$ with the given soldering form).
This works also with $G$-structures on $M$. Consider a principal $G$-bundle $P\to M$ and a representation $\rho:G\to GL(V)$ where $\dim(V)=\dim(M)$. A soldering form is now a $G$-equivariant and horizontal 1-form $\sigma\in\Omega^1(P,V)^G_{hor}$ which is fiberwise surjective. This induces a form $\bar\sigma\in\Omega^1(M,P\times_G V)$ which is a soldering form in the sense above. You can compute torsion either on $P$ or on $M$ and they correspond to each other.
Edit: Parallel transport and torsion. In 24.2 (page 299ff) of [2]
it is stated that The parallel transport along the flow lines of a vector field $X$ are the flow lines of the horizontal lift $C(X)$ of the vector field: $Pt(Fl^X,t) = Fl^{C(X)}_t$. Thus by 3.16, parallel transporting along a parallelogram consisting of flow lines and differentiating twice is:
$$\partial_t^2|_0 Pt(Fl^X,-t) Pt(Fl^Y,-t) Pt(Fl^X,t) Pt(Fl^Y,t) = \partial_t^2|_0 Fl^{C(X)}_{-t} Fl^{C(Y)}_{-t} Fl^{C(X)}_t Fl^{C(Y)}_t = [C(X),C(Y)]$$
By 24.6 we have $[C(X),C(Y)](Z) - C([X,Y])(Z) = vl(Z, R(X,Y)Z)$, where $vl$ is the vertical lift, so you get curvature and not torsion.
The only way to get torsion is using again 24.2: $\nabla_XY = \partial_t|_0 Pf(Fl^X,t)^* Y$ and building torsion out of this.
A global formula for torsion which runs over the second tangent bundle you find in 5 of [3]
.
[2]
Peter W. Michor: Topics in Differential Geometry. Graduate Studies in Mathematics, Vol. 93 American Mathematical Society, Providence, 2008. (pdf)
[3]
Peter W. Michor: The Jacobi Flow. Rend. Sem. Mat. Univ. Pol. Torino 54, 4 (1996), 365-372 (pdf)