Geometric intuition of conjugate function

I found Bertsekas' explanations quite simple and useful to understand many different things in convex analysis and optimization. You may want to check out his book "Convex Optimization Theory", or his notes for the MIT course, which also cover conjugacy.

The short explanation on page 7 of the notes is as follows:

Dual description of convex functions

  • Define a closed convex function by its epigraph.
  • Describe the epigraph by hyperplanes.
  • Associate hyperplanes with crossing points (the conjugate function).

enter image description here

Primal description: Values $f(x)$. Dual description: Crossing points $f^*(y)$.


To me the best interpretation is economic. Interpret $f(x)$ as the cost to produce the quantity $x$ of some product and interpret $y$ as the market price per unit. It is easy to see that $f^*(y)$ represents the optimal profit at given prices $y$. The quantity $xy$ represents revenue from sales and $f(x)$ represents production costs.

Now for the geometrical interpretation. If you sketch the graph of the costs of production $f(x)$ and assume it convex, continuous, and differentiable, you will see that the point of optimal production, given prices $y$, is given by $y - f'(x)=0$, and this can be found graphically with a ruler, looking for the tangent in the cost curve with the same slope $y$. If you place the ruler in that tangent point, it can be seen that the ruler intersection with the vertical axis will give $-(xy - f(x))$.

This is a very useful calculating device. Provided only with the graph of $f(x)$ and a ruler, the analyst is able to turn the ruler and find what is the optimal profit for each possible price. This can be plot into another piece of paper. Then given any price $y$ he is able to find what was the optimal profit. Without noticing, he has discovered the conjugate function.


I'm going to attempt a very basic and intuitively understandable explanation. Of course this will be grossly oversimplifying things but I understand it's what was asked for.

The point of the convex conjugate is to represent a function $f$ as a set of tangent hyperplanes. The parameters of all the tangent hyperplanes are encoded in the convex conjugate function $f^*$.

Let's make things easier to understand by covering the 1D case. In that case our $f^*$ is called the Legendre transformation and our hyperplanes become simple lines. The generalization to multidimensional functionals is then relatively straightforward.

Here's the Legendre transformation:

$f^*(y)=\sup(yx - f(x))$

The domain of $f^*$ is slope values, the co-domain is y-offsets (as in a simple line equation $ax + b$). Hence $f^*$ encodes a set of lines. First we're going to evaluate $f^*$ at a single point $y$.

$f^*$ at a point $y$ is the largest difference value between $f$ and a line through the origin with slope $y$.

Note that this value might be still negative. In plain English, it's the degree to which $yx$ "surpasses" $f(x)$ in value. If your function $f$ goes "above" $yx$ (for example $f(x) = x^2+1$ and $y=0.5$) then the value $f^*(y)$ will be negative. If your function goes "below" $yx$ your value will be positive.

The "surpass value" is important because in the convex case it is one parameter of a tangent line to $f$: the y-offset.

To imagine $f^*(y)$ not just for a single value $y$ but for the entire domain, picture $f(x)$ and a line through the origin that's rotating around the origin, like a propeller. For each incremental rotation we plot the largest difference value between $f$ and the line into a new coordinate system (where $y$ may go along the x-axis and $f^*(y)$ may go along the y-axis). The resulting graph then shows the whole Legendre transformation of $f$.

The Wikipedia Article on Legendre transformations also has some more good information, such as:

If the convex function $f$ is defined on the whole line and is everywhere differentiable, then $f^*(y)$ can be interpreted as the negative of the y-intercept of the tangent line to the graph of f that has slope $y$.

So $f^*$ can be seen as a map where the input is a slope and the output is a y-offset in the coordinate system of $f$. Thus $f^*$ holds the information how much I need to offset a line with a particular slope such that it becomes a tangent to $f$. The set of all the lines offset in y by their respective amounts covers the area at and below $f$.

It's not too hard to imagine why that particular trick only works for convex functions.

PS: If $f$ is convex there is a much easier definition of the Legendre transformation which avoids the whole supremum thing:

$f^*(f'(x))=x*f'(x)-f(x)$

In plain English: For $x$ We compute $f'(x)$ and define $f^*$ at $f'(x)$ as the negative y intercept of the tangent line.