Why do we need the covariant derivative along a curve - why are linear connections not sufficient?
I think you are right that one could make sense out of $\nabla_{\gamma'(t)}Y$ even if $Y$ is a nonextendible vectorfield along a curve $\gamma: I\to M$. One could try to do this as follows:
If $\gamma'(t)\neq 0$ then there is a neighbourhood $J$ of $t$ such that $\gamma_{|J}$ is an embedding. We then can find a globally defined vectorfield $\tilde Y$ on $M$ such that $Y$ and $\tilde Y \circ\gamma$ agree locally arround $t$ and then define $\nabla_{\gamma'(t)}Y= \nabla_{\gamma'(t)}\tilde Y$ which will not depend on the choice of $\tilde Y$
If $\gamma'(t)= 0$ we simply define $\nabla_{\gamma'(t)}Y=0$.
Now one can show that in the first case this definition agrees with the usual definition of the covariant derivative of $Y$ along $\gamma$. But in the second case it doesn't:
Consider for example $\gamma:I\to\mathbb R^2, t\mapsto(t^2,t^3)$ and $Y(t)=\gamma'(t)$ where $\mathbb R^2$ is equipped with the Levi-Civita connection . Then using standard coordinates on $\mathbb R^2$ we have $Y'=2t(\partial_1\circ\gamma)+3t^2(\partial_2\circ\gamma)$. Using the leibniz rule and the agreement with extendible vectorfields we see that the covariant derivative along $\gamma$ is given by $2(\partial_1\circ\gamma)+6t(\partial_2\circ\gamma)$. Especially at $t=0$ it is non-zero even if $\gamma'(0)=0$.