Mean value theorem for a gradient of convex function

So there may be further necessary context, but I think that this is false in general, even for a convex separable function. Take the example of $f(x,y) = x^4 + y^6$. We have

\begin{align*} \nabla f(x,y) &= \begin{bmatrix} 4x^3 \\ 6y^5\end{bmatrix}\\ \nabla^2 f(x,y) &= \begin{bmatrix} 12x^2 & 0 \\ 0 & 30y^4\end{bmatrix}\\ \end{align*}

Now take the two point $(0,0)$ and $(x,y)$, and apply this "mean value theorem" which will generate some point $(tx,ty)$ for $t\in [0,1]$ on the line between $(0,0)$ and $(x,y)$:

\begin{align*} \nabla f(x,y) - \nabla f(0,0) &= \begin{bmatrix} 4x^3 \\ 6y^5\end{bmatrix}\\ \begin{bmatrix} 4x^3 \\ 6y^5\end{bmatrix} &= \begin{bmatrix} 12t^2x^2 & 0 \\ 0 & 30t^4y^4\end{bmatrix} \left(\begin{bmatrix} x \\ y\end{bmatrix} - \begin{bmatrix} 0 \\ 0\end{bmatrix}\right)\\ &= \begin{bmatrix} 12t^2x^3 \\ 30t^4y^5\end{bmatrix} \end{align*}

Now for this mean value theorem to be true, we need both $t^2 = 1/3$ and $t^4 = 1/5$. Obviously this can never be the case.

There are other mean value theorems than the 1D one that are possible (https://en.wikipedia.org/wiki/Mean_value_theorem#Mean_value_theorem_for_vector-valued_functions), but none like this.

Applying the MVT on the components will result in a point that is not necessarily on the line though, but in the boxspanned between the two point (this is what I think the authors had in mind). Reading over the paper, this point being on the line between the two iterates in not really important to the proof. What's important is this point's proximity to the origin which determines how it compares to $\epsilon$.


Given the $J$ he has, note $$ \nabla J_i = \frac{u_i}{\sqrt{u_i^2 +\epsilon}} $$ Thus $$ \nabla^2 J_{ij} = \frac{\partial^2 J}{\partial u_i \partial u_j}= \delta_{ij} \left ( \frac{1}{\sqrt{u_i^2 +\epsilon}} - \frac{u_i^2}{(u_i^2 +\epsilon)^{3/2}}\right)=\frac{\delta_{ij} \epsilon}{(u_i^2 + \epsilon)^{3/2}}$$ The mean value theorem here is given by for $\nabla J_i$ is given by $$ \nabla J_i (x) - \nabla J_i (y) = \nabla (\nabla J_i(u_i))\cdot (x-y) $$ with $u_i$ between $x_i$ and $y_i$. Thus $$ \nabla J (x) - \nabla J (y) = \nabla^2J(u)\cdot ( x-y) $$ NOTE that $u$ may not be on the line between $x$ and $y$. Convexity isn't really playing a direct role here, since you have a formula for $J$. But you could probably make an argument in the general case using convex $\iff \nabla^2 J \geq 0$ (positive semi-definite).