Why does the Hessian work?
For some reason, the geometric meaning of the Hessian is rarely made explicit. If $f\colon \mathbb{R}^n\to\mathbb{R}$ is a $C^2$ function in a neighborhood of a point $p$, the Hessian matrix $Hf_p$ is the matrix for the quadratic form that is the second directional derivative of $f$ at $p$. What this means is that, if $\textbf{v}$ is an $n$-dimensional vector, then $\textbf{v}^T(Hf_p)\textbf{v}$ is the directional second derivative of $f$ at $p$, i.e. $$ \textbf{v}^T(Hf_p)\textbf{v} \;=\; \frac{d^2}{dt^2}\bigl[f(p+t\textbf{v})\bigr]\biggr|_{t=0} $$ This equation can be derived fairly easily using the multivariable chain rule.
For a critical point $p$, the directional second derivatives usually determine whether $p$ is a local minimum, a local maximum, or a saddle point. In particular:
If the directional second derivative is positive in every direction, then $p$ is a local minimum.
If the directional second derivative is negative in every direction, then $p$ is a local maximum.
If the directional second derivative is positive in some directions and negative in other directions, then $p$ is a saddle point.
These statements follow from the usual second-derivative test in single-variable calculus, where the single variable functions in question are the cross-sectional functions $t\mapsto f(p+t\textbf{v})$.
All of this relates to certain facts about symmetric matrices from linear algebra:
A symmetric matrix $A$ has the property that $\textbf{v}^TA\textbf{v} > 0$ for all nonzero vectors $\textbf{v}$ if and only if $A$ has all positive eigenvalues. (Such a matrix is called positive definite.)
A symmetric matrix $A$ has the property that $\textbf{v}^TA\textbf{v} < 0$ for all nonzero vectors $\textbf{v}$ if and only if $A$ has all negative eigenvalues. (Such a matrix is called negative definite.)
A symmetric matrix $A$ satisfies $\textbf{v}^TA\textbf{v} > 0$ for some vectors and $\textbf{v}^TA\textbf{v} < 0$ for other vectors if and only if it has at least one positive eigenvalue and at least one negative eigenvalue.
Applying these facts to the Hessian gives:
If $p$ is a critical point for $f$ and $Hf_p$ is positive definite, then $p$ is a local minimum for $f$.
If $p$ is a critical point for $f$ and $Hf_p$ is negative definite, then $p$ is a local maximum for $f$.
If $p$ is a critical point for $f$ and $Hf_p$ has at least one positive eigenvalue and at least one negative eigenvalue, then $p$ is a saddle point for $f$.
For a $2\times 2$ matrix, you can determine the signs of the eigenvalues by investigating the trace and the determinant. This is because the trace of a matrix is the sum of its eigenvalues, and the determinant of a matrix is the product of its eigenvalues. In particular:
A $2\times 2$ matrix has two positive eigenvalues if and only if the trace and determinant are both positive
A $2\times 2$ matrix has two negative eigenvalues if and only if the trace is negative and the determinant is positive.
A $2\times 2$ matrix has one positive and one negative eigenvalue if and only if the determinant is negative.
Note that these statements don't hold for $3\times 3$ or larger matrices. For example, a $3\times 3$ matrix with eigenvalues $-2,-1,10$ will have positive trace ($7$) and positive determinant ($20$). For such a matrix, you really have to determine the eigenvalues explicitly, or use something like Sylvester's criterion to determine whether the Hessian is positive definite, negative definite, or neither.
Edit: Since this seems to be my main post about the Hessian, I should mention that the Hessian can also be interpreted as the matrix for a symmetric bilinear form: $$ (\mathbf{v},\mathbf{w}) \,\mapsto\, \mathbf{v}^T(Hf_p)\mathbf{w}. $$ This bilinear form represents the "mixed" second directional derivative of $f$ in the directions of $\mathbf{v}$ and $\mathbf{w}$. That is, if $D_{\mathbf{v}}f$ represents the directional derivative of $f$ in the direction of $\mathbf{v}$, then $$ \mathbf{v}^T(Hf)\mathbf{w} \,=\, D_{\mathbf{v}}D_{\mathbf{w}}f = D_{\mathbf{w}}D_{\mathbf{v}}f. $$ Equivalently, $$ \mathbf{v}^T(Hf_p)\mathbf{w} \,=\, \frac{\partial^2}{\partial s\,\partial t}\bigl[f(p+s\mathbf{v}+t\mathbf{w})\bigr]\biggr|_{s,t=0}. $$ When $\mathbf{v}=\mathbf{w}$ this reduces the to second directional derivative in the direction of $\mathbf{v}$ described above.