How to interpret Hessian of a function

I think the following interpretation is useful. In the same way that the gradient of a function of many variables is analogous to the first derivative in one variable, I understand the Hessian to be analogous to the second derivatives. Taking a vector $v \in \mathbb R^n$ the gradient of $f$ allows us to compute the directional derivative of $f$ in the direction of $v$: $$f_v(x) = v^T \nabla f(x).$$ Similarly, the Hessian $Hf(x)$ allows us to compute the second derivatives: given vectors $u,v\in \mathbb R^n$: $$f_{uv}(x) = u^THf(x)v.$$ For example, $f_{uu} =u^THf(x)u$ is the second derivative in the direction of $u$.

Edit: u and v should all be taken as unit vectors.


This is more a comment to Brulboy's answer than an answer in itself.

If you fix a point $x$, you can usually${}^{[1]}$ diagonalize the Hessian matrix with an orthonormal basis. This means that you can take a Cartesian orthogonal coordinate system centered at $x$ which makes your matrix look like this: $$Hf(x)\equiv \begin{bmatrix} \lambda_1 & & &\\ & \lambda_2 & &\\ & & \ddots &\\& & &\lambda_n\end{bmatrix}.$$ Those $\lambda$s are second derivatives of functions of a single variable. Specifically, if you restrict $f$ to the $j$-th coordinate axis (of the diagonalizing coordinate system), you get a function of one variable whose second derivative is exactly $\lambda_j$.

Therefore, when you look at a Hessian matrix at a point and see a mess of numbers, you are simply not choosing the best coordinates for that point.


${}^{[1]}$. By usually I mean: even if somebody proved that, sometimes, this cannot be done, the examples are somewhat pathological and unlikely to manifest themselves in real life. The matter is discussed thoroughly here.


From what I have gathered, since the Hessian encodes the second derivatives for a multivariate function, its main function is to help determine maxima and minima. It also helps to write compact 2nd degree Taylor approximations for functions. Perhaps there is more but this is what I have just learned about the Hessian.