2011 April 10 « Mirror Image

Minimum sum of distance vs L1 and geometric median

All this post is just a more detailed explanation of the end of the previous post.
Assume we want to estimating a state $x \in R^n$ from $m \gg n$ noisy linear measurements $y \in R^m$ , $y = Ax + z$ , $z$ – noise with outliers, like in the paper by Sharon, Wright and Ma Minimum Sum of Distances Estimator: Robustness and Stability
Sharon at al show that minimum $L_1$ norm estimator, that is
$arg \min_{x} \sum_{i=1}^{m} \| a_i^T x - y_i \|_{1}$
is a robust estimator with stable breakdown point, not depending on the noise level. What Sharon did was to use as cost function the sum of absolute values of all components of errors vector. However there are exists another approach.
In one-dimensional case minimum $L_1$ norm is a median.But there exist generalization of median to $R^n$ – geometric median. In our case it will be
$arg \min_{x} \sum_{i=1}^{m} \| A x - y_i \|_{2}$
That is not a least squares – minimized the sum of $L_2$ norm, not the sum of squares of $L_2$ norm.
Now why is this a stable and robust estimator? If we look at the Jacobian
$\sum_{i=1}^{m} A^T \frac{A x - y_i}{ \| A x - y_i \|_{2}}$
we see it’s asymptotically constant, and it’s norm doesn’t depend on the norm of the outliers. While it’s not a formal proof it’s quite intuitive, and can probably be formalized along the lines of Sharon paper.
While first approach with $L_1$ norm can be solved with linear programming, for example simplex method and interior point method, the second approach with $L_2$ norm can be solved with second order cone programming and …surprise, interior point method again.
For interior point method, in both cases original cost function is replaced with
$\sum f$
And the value of $f$ is defined by constraints. For $L_1$
$f_{i} \ge a_ix-y_i$ , $f_{i} \ge -a_ix+y_i$
Sometimes it’s formulated by splitting absolute value is into the sum of positive and negative parts
$f_{+_{i}} \ge a_ix-y_i$ , $f_{-_{i}} \ge -a_ix+y_i$ , $f_{+_{i}} \ge 0$ , $f_{-_{i}} \ge 0$
And for $L_2$ it’s a simple
$f_i \ge \| A x - y_i \|_{2}$
Formulations are very similar, and stability/performance are similar too (there was a paper about it, just had to dig it out)