Is all non-convex optimization heuristic?

If the question is "Are there non-convex global search algorithms with provably nice properties?" then the answer is "Yes, lots." The algorithms I'm familiar with use interval analysis. Here's a seminal paper from 1979: Global Optimization Using Interval Analysis. And here's a book on the topic. The requirements on the function are that it be Lipschitz or smooth to some order, and that it not have a combinatorial explosion of local optima. These techniques aren't as fast as linear or convex programming, but they're solving a harder problem, so you can't hold it against them. And from a practical point of view, for reasonable functions, they converge plenty fast.


I think you will be interested in the work of Parrilo, Lasserre, Putinar, Sturmfels, Nie, Helton, etc., on sums-of-squares and moment methods for polynomial optimization. They look at principled ways of turning general (nonconvex) optimization of polynomials with polynomial constraints into sequences of (convex) semidefinite programs.

The general idea is as follows. Suppose we wish to minimize $f$ over $X$. Assume for simplicity that $X$ is compact and $f$ is continuous. We can turn this into a convex problem by extending it to the set of $\Delta(X)$ of Borel probability measures over $X$ and defining $f(\sigma)=\int f d\sigma$ for $\sigma\in\Delta(X)$. The resulting optimization problem is convex, has the same global optimum value as the original problem, and from any optimal solution $\sigma$ one can easily extract an optimal solution to the original problem (just take any any element of the support of $\sigma$).

The problem of course is that our new convex problem is infinite-dimensional. However, in case all the problem data are polynomial there are a number of nice ways of approximating the infinite-dimensional problem from the outside by semidefinite programs. In particular, we can find sequences of such SDPs in successively higher dimensions whose values increase monotonically to the global minimum value of the original problem.

This is in stark contrast to local methods in which we try to decrease toward the minimum value, and nonconvexity more or less ensures that we cannot do this monotonically. Furthermore, while local methods easily give you upper bounds on the minimum value (just evaluate f anywhere to get such a bound), it is very hard to get any information about matching lower bounds out of local methods.

Of course the downside is that when the SDP gives you a value which is strictly lower than the global minimum then it by definition cannot give you an $x$ where this value is achieved. However, if one of the SDPs in the sequence gives the exact global minimum then a value of $x$ where this is reached can be extracted from the dual SDP, at which point one has matching upper and lower bounds on the optimum, hence a certificate that this $x$ is optimal. Alternatively, once you feel that you have a "good" lower bound you could use local methods to try to find an $x$ which gets close enough to that value for your purposes.

Of course NP-completeness issues ensure that in general there's not much we can say about the convergence rate of such procedures. However, in practice they work amazingly well. Explaining this is an important open problem.

There is an excellent introduction to such techniques on MIT OCW in the form of lecture notes for Parrilo's course (full disclosure: he is one of my thesis advisors).


In some sense, the fundamental difficulty with non-convex optimization is that you very quickly run up against NP-completeness. If $P\ne NP$, then there's not going to be any efficient, general-purpose method to solve non-convex optimization problems or convert them into convex ones.

Having said that, as Carl wrote, of course there are plenty of interesting things to prove about non-convex optimization, if you're willing to give up on a fast algorithm that always works! For example, approximation guarantees, convergence in mild exponential time...