Is there any obvious way to enforce a minimum amount of "positive definiteness" on a matrix?

I'm responding first to your background comment, but it will lead to an approach to your original question. A quasi-Newton method minimizes a smooth function $f:\mathbb R^n \to \mathbb R$ using the iteration $$ \tag{1} x_{k+1} = \arg \min_x f(x_k) + \nabla f(x_k)^T(x - x_k) + \frac12 (x - x_k)^T B_k (x - x_k). $$ Quasi-Newton methods differ in the choice of the matrix $B_k$. (If $B_k = \nabla^2 f(x_k)$, then the above iteration is Newton's method. In quasi-Newton methods, $B_k$ is an approximation to $\nabla^2 f(x_k)$ that can be computed inexpensively.)

The approximation in (1) is good when $x$ is close to $x_k$. It would be natural to add a penalty term to the objective function in (1) to discourage $x$ from straying too far from $x_k$: $$ \tag{2} x_{k+1} = \arg \min_x f(x_k) + \nabla f(x_k)^T(x - x_k) + \frac12 (x - x_k)^T B_k (x - x_k) + \frac1{2t} \|x - x_k \|_2^2. $$ The parameter $t > 0$ can be thought of as a "step size" that controls how severely we are penalized for moving away from $x_k$. Including such a penalty term is a common trick in optimization; for example, the proximal gradient method and the Levenberg-Marquardt algorithm can both be interpreted as using this trick.

I'll assume that $B_k$ is symmetric and positive semidefinite, which is typical in quasi-Newton methods. Setting the gradient of the objective function in (2) with respect to $x$ equal to $0$, we obtain $$ \nabla f(x_k) + (B_k + \frac{1}{t} I)(x - x_k) = 0. $$ Here $I$ is the identity matrix. The coefficient matrix $B_k + \frac{1}{t} I$ is guaranteed to be positive definite. The solution to this equation is $$ \tag{3} x_{k+1} = x_k - (B_k + \frac{1}{t} I)^{-1} \nabla f(x_k). $$ If $t$ is very small, then $(B_k + \frac{1}{t}I)^{-1} \approx t I$, and the update (3) is approximately a gradient descent update with step size $t$. On the other hand, if $t$ is large, then $(B_k + \frac{1}{t}I)^{-1} \approx B_k^{-1}$, and the update (3) is approximately a quasi-Newton update. So the iteration (3) is like a compromise between a quasi-Newton method and gradient descent.

The Levenberg-Marquardt algorithm chooses the parameter $t$ adaptively, as follows. If $f(x_{k+1}) < f(x_k)$, then $x_{k+1}$ is accepted and $t$ is increased by a factor of 10. Otherwise, $x_{k+1}$ is rejected and $t$ is reduced by a factor of $10$, and then $x_{k+1}$ is recomputed. We only accept $x_{k+1}$ once a reduction in the value of $f$ has been achieved. (We don't have to use a factor of 10, but that is a typical choice.)

Note: Here is an important question about the above proposed algorithm. Quasi-Newton methods rely on the fact that the inverse of $B_k$ can be computed efficiently. Otherwise, we might as well just use Newton's method. In the algorithm I proposed, can the inverse of $B_k + \frac{1}{t} I$ be computed efficiently? If not, then we might as well just take $B_k = \nabla^2 f(x_k)$.

Can the quasi-Newton strategies to update $B_{k}^{-1}$ efficiently be adapted to update $(B_k + \frac{1}{t} I)^{-1}$ efficiently?

That is a question I will need to ponder...

If you are looking for a generalization of $\max(a,b)$ for matrices you can use the following:

$$ \max(a,b) = \frac{|a+b|}{2}+\frac{|a-b|}{2}. \ \ \ \ \ (1) $$

Now there is a generalization of absolute value for matrices given by $|A|:=\sqrt{A^T A}$ (I'm assuming your matrices are real). With this generalization Eq. (1) is valid also for square (real) matrices.

Edit

The formula above implicitly assumes $a\ge0, \ b\ge0$ which may not be necessarily satisfied (Eq. (1) is valid if $a+b\ge0$). More generally the function $\max(a,b)$ requires to check the positivity of $a-b$. This approach cannot be generalized to matrices as there are matrices which are neither positive nor negative semidefinite. However other generalizations are still possible. For example if we know that $b\ge0$ (this could be the OP's case as supposedly $F\ge0$) we can use

$$ \max'(a,b) = \theta(a) \left ( \frac{|a+b|}{2}+\frac{|a-b|}{2} \right ) + (1-\theta(a)) b $$

where $\theta(x)$ is Heaviside's function. This function can be generalized to matrices $x$ provided $x$ is diagonalizable.

Is there any obvious way to enforce a minimum amount of "positive definiteness" on a matrix?

Tags:

Linear Algebra

Hessian Matrix

Eigenvalues Eigenvectors

Positive Definite

Related

Recent Posts