Gradient and Jacobian row and column conventions
In general, the derivative of a function $f : \mathbb{R}^n \to \mathbb{R}^m$ at a point $p \in \mathbb{R}^n$, if it exists, is the unique linear transformation $Df(p) \in L(\mathbb{R}^n,\mathbb{R}^m)$ such that $$ \lim_{h \to 0} \frac{\|f(p+h)-f(p)-Df(p)h\|}{\|h\|} = 0; $$ the matrix of $Df(p)$ with respect to the standard orthonormal bases of $\mathbb{R}^n$ and $\mathbb{R}^m$, called the Jacobian matrix of $f$ at $p$, therefore lies in $M_{m \times n}(\mathbb{R})$.
Now, suppose that $m=1$, so that $f : \mathbb{R}^n \to \mathbb{R}$. Then if $f$ is differentiable at $p$, $Df(p) \in L(\mathbb{R}^n,\mathbb{R}) = (\mathbb{R}^n)^\ast$ is a functional, and hence the Jacobian matrix, as you point out, lies in $M_{1 \times n}(\mathbb{R})$, i.e., is a row vector. However, by the Riesz representation theorem, $\mathbb{R}^n \cong (\mathbb{R}^n)^\ast$ via the map that sends a vector $x \in \mathbb{R}^n$ to the functional $y \mapsto \left\langle y,x \right\rangle$. Hence, if $f$ is differentiable at $p$, then the gradient of $f$ at $p$ is the unique (column!) vector $\nabla f(p) \in \mathbb{R}^n$ such that $$ \forall h \in \mathbb{R}^n, \quad Df(p)h = \left\langle \nabla f(p),h\right\rangle; $$ in particular, if you unpack definitions, you'll find that the Jacobian matrix of $f$ at $p$ is precisely $\nabla f(p)^T$.