Hessian matrix as derivative of gradient
The line "Then $D[\nabla f(x)]$ must be a $1\times n$ matrix" is where your confusion lies.
The derivative operator $D$ applied to a vector gives us how each component changes with each direction. Being more explicit with the notation we have
$$\begin{align}\nabla f(\mathbf x) &= D[f (\mathbf x)]\\ &= \left(\frac{\partial f}{\partial x_1}, \ldots, \frac{\partial f}{\partial x_n}\right)\end{align}$$
Now think of applying $D$ to each element of this vector individually;
$$\begin{align}D[\nabla f(\mathbf x)] &= D[D[f(\mathbf x)]]\\ &=\left(D\left[\frac{\partial f}{\partial x_1}\right]^T, \ldots, D\left[\frac{\partial f}{\partial x_n}\right]^T\right)\end{align}$$ Which expands to give us the Hessian matrix $$D^2[f(\mathbf x)]=\left(\begin{matrix}\frac{\partial^2 f}{\partial x_1^2} & \ldots & \frac{\partial^2 f}{\partial x_1\partial x_n}\\ \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n\partial x_1}& \ldots & \frac{\partial^2 f}{\partial x_n^2}\end{matrix}\right)$$ which is indeed $n\times n$.