2011 September « Mirror Image

Interior point method – if all you have is a hammer

Interior point method for nonlinear optimization often considered as complex, or highly nontrivial etc. The fact is, that for “simple” nonlinear optimization it’s quite simple, manageable and can even be explained in #3tweets. For those not familiar with it there is a simple introduction to it in wikipedia, which in turn follows an excellent paper by Margaret H. Wright.
Now about “if all you have is a hammer, everything looks like a nail”. Some of applications of interior point method could be quite unexpected.
Everyone who worked with Levenberg-Marquardt minimization algorithm know how much pain is the choice of the small parameter $\lambda$ . Levenberg-Marquardt can also be seen as modification of Gauss-newton with a trust region. The $\lambda$ of Levenberg-Marquardt do correspond to the trust region radius, but that dependence is highly complex and is difficult to estimate. You want trust region of the radius r, but what should be avlue of $\lambda$ ? There is no easy answer to that question; there are some complex methods, or there is a testing with subdivision, which is what the original Levenberg-Marquardt implement.
Interior point can help here.
If we choose shape of trust region for Gauss-Newton as hypercube or simplex or like, we can formulate it as set of L1 norm inequality constrains. And that is the domain of interior point method! For hypercube $||\Delta x||_1 \leq \epsilon$ the resulting equations looks especially nice
$\begin{pmatrix} W & -I & I \\ -I & diag & 0 \\ I & 0 & diag \end{pmatrix}$
W – hessian, I – identity, diag – diagonal
That is a banded arrowhead matrix, and for it Cholesky decomposition cost insignificantly more than decomposition of original W. The matrix is not positive definite – Cholesky without square root should be used.
Now there is a temptation to use single constrain $||\Delta x||_2 \leq \epsilon$ instead of set of constrain $||\Delta x||_1 \leq \epsilon$ , but that will not work. $||\Delta x||_2 \leq \epsilon$ should have to be linearized to be tractable, but it’s a second order condition – it’s linear part is zero, so linearization doesn’t constrain anything.
The same method could be used whenever we have to put constrain on the value of Gauss-Newton update, and shape of constrain in not important (or polygonal)
Now last touch – Interior point method has small parameter of it’s own. It’s called usually $\mu$ . In the “normal” method there is a nice rule for it update – take it as $\mu = \lambda C$ (in the notation from wikipedia article – $C$ is a value of constraint, $\lambda$ is a value of slack variable at the previous iteration) That rule usually explicitly stated in the articles about Interior Point Method(IPM) for Linear Programming, but omitted (as obvious probably) in the papers about IPM for nonlinear optimization
In our case (IPM for trust region) we don’t need update $\mu$ at all – we move boundary of the region with each iteration, so each $\mu$ is an initial value. Have to remember, $\mu$ is not a size of trust region, but strength of it’s enforcement.