Derivative of inner product

For a smooth $f:\mathbb{R}^n\to\mathbb{R}^m$, you have $df:\mathbb{R}^n\to\mathcal{L}(\mathbb{R}^n,\mathbb{R}^m)$

Being differentiable is equivalent to: $$ f(x+h)=f(x)+df(x)\cdot h+o(\|h\|) $$

In your case, $f(x)=\langle x,x \rangle_G$ and $m=1$, hence differential at $x$, $df(x)$ is in $\mathcal{L}(\mathbb{R}^n,\mathbb{R})$. It's a linear form.

Let's be more explicit: \begin{align*} f(x+h)=& \langle x+h,x+h \rangle_G \\ =& \underbrace{\langle x,x \rangle_G}_{f(x)} + \underbrace{2\langle x,h \rangle_G }_{df(x)\cdot h}+ \underbrace{\langle h,h \rangle_G}_{\in o(\|h\|)}\\ \end{align*}

Hence your differential is defined by $$ df(x)\cdot h = 2\langle x,h \rangle_G = (2x^tG)h $$ where $2x^tG=\left(\partial_{x_1} f,\dots,\partial_{x_n} f\right)$ is your "row" vector.

Note that, because $m=1$, you can also use a vector $\nabla f(x)$ to represent $df(x)$ using the canonical scalar product. This vector is by definition the gradient of $f$:

$$ df(x)\cdot h = \langle \nabla f(x),h \rangle = \langle 2Gx,h \rangle $$ where $\nabla f(x)=2Gx=\left(\begin{array}{c}\partial_{x_1} f \\ ... \\\partial_{x_n} f\end{array}\right)$. This is your "column" vector.


The difference is in the fact the author in the second reference prefers to arrange the components of the gradient. In the first paragraph they state

Let $x\in \mathbb{R}^n$ (a column vector) and let $f : \mathbb{R}^n \to R$. The derivative of $f$ with respect to $x$ is a row vector: $$ \frac{\partial f}{\partial x} = \left(\frac{\partial f}{\partial x_1}, \cdots , \frac{\partial f}{\partial x_n} \right) $$

You can argue this is a better option than the first one (e.g. this answer), but at the end of the day is just a matter of notation. Pick the one you prefer and stick with it to avoid problems down the line


More generally, suppose we differentiate any scalar-valued function $f$ of a vector $\mathbf{x}$ with respect to $\mathbf{x}$. By the chain rule, $$df=\sum_i\frac{\partial f}{\partial x_i}dx_i=\boldsymbol{\nabla}f\cdot d\mathbf{x}=\boldsymbol{\nabla}f^T d\mathbf{x}.$$(Technically, I should write $df=(\boldsymbol{\nabla}f^T d\mathbf{x})_{11}$ to take the unique entry of a $1\times 1$ matrix.)

If you want to define the derivative of $f$ with respect to $\mathbf{x}$ as the $d\mathbf{x}$ coefficient in $df$, you use the last expression, obtaining the row vector $\boldsymbol{\nabla}f^T$. Defining it instead as the left-hand argument of the dot product, giving the column vector $\boldsymbol{\nabla}f$, is an alternative convention.