## Finds in arxiv, January.

Repost from googleplus stream

Finds in arxiv

**Computer vision**

*Convex Relaxations of SE(2) and SE(3) for Visual Pose Estimation*

Finding camera position from the series of 2D (or depth) images is one of the most common (and difficult) task of computer vision.

The biggest problem here is incorporating rotation into cost(energy) function, that should minimize reprojection(fro 2D) error (seehttp://en.wikipedia.org/wiki/Bundle_adjustment). Usually it’s done by minimization on manifold – locally presenting rotation parameter space as linear subspace. The problem with that approach is that initial approximation should be good enough and minimization goes by small steps pojecting/reprojecting on manifold (SO(3) in the case) which can stuck in local minimum. Where to start if the is no initial approximation? Usually it’s just brute-forced with several initialization attempt. Here in the paper approach is different – no initial approximation needed, solution is straightforward and minimum is global. The idea is instead of minimizing rotation on the sphere minimize it inside the ball. In fact it’s a classical **convexification** approach – increase dimentionality of the problem to make it convex. The pay is a lot higher computational cost of cause

http://arxiv.org/abs/1401.3700

*A Sparse Outliers Iterative Removal Algorithm to Model the Background in the Video Sequences*

Background removal without nuclear norm. Some “good” frames chosen as dictionary and backround represented in it.

http://arxiv.org/abs/1401.6013

**Deep Learning**

*Learning Mid-Level Features and Modeling Neuron Selectivity for Image Classification*

Mid level features is a relatively new concept but a very old practice. It’s a building “efficient” subset of features from the set of features obtained by low-level feature descriptor(like SIFT, SURF, FREAK etc) Before it was done by PCA, later by sparse coding.

What authors built is some kind of hibrid of convolutional network with dictionary learning. Mid level features fed into neuron layer for sparsification and result into classifier. There are some benchmarks but no CIFAR or MINST

http://arxiv.org/abs/1401.5535

**Optimization and CS**

*Alternating projections and coupling slope*

Finding intersection of two non-convex sets with alternating projection. I didn’t new that transversality condition used a lot in convex geometry too.

http://arxiv.org/abs/1401.7569

*Bayesian Pursuit Algorithms*

Bayesian version of hard thresholding. Bayesian approach is in how threshold is chosen and update scaled.

http://arxiv.org/abs/1401.7538

## November finds in #arxiv and NIPS 2013

This is “find in arxiv” reposts form my G+ stream for November.

**NIPS 2013**

*Accelerating Stochastic Gradient Descent using Predictive Variance Reduction*

Stochastic gradient (SGD) is the major tool for Deep Learning. However if you look at the plot of cost function over iteration for SGD you will see that after quite fast descent it becoming extremely slow, and error decrease could even become non-monotonous. Author explain by necessity of trade of between the step size and variance of random factor – more precision require smaller variance but that mean smaller descent step and slower convergence. “Predictive variance” author suggest to mitigate problem is the same old “adding back the noise” trick, used for example in Split Bregman. Worth reading IMHO.

*Predicting Parameters in Deep Learning*

Output of the first layer of ConvNet is quite smooth, and that could be used for dimensionality reduction, using some dictionary, learned or fixed(just some simple kernel). For ConvNet predicting 75% of parameters with fixed dictionary have negligible effect on accuracy.

http://papers.nips.cc/paper/5025-predicting-parameters-in-deep-learning

*Learning a Deep Compact Image Representation for Visual Tracking*

Application of ADMM (Alternating Direction Method of Multipliers, of which Split Bregman again one of the prominent examples) to denoising autoencoder with sparsity.

http://papers.nips.cc/paper/5192-learning-a-deep-compact-image-representation-for-visual-tracking

*Deep Neural Networks for Object Detection*

People from Google are playing with Alex Krizhevsky’s ConvNet

http://papers.nips.cc/paper/5207-deep-neural-networks-for-object-detection

**–arxiv (last)–**

*Are all training examples equally valuable?*

It’s intuitively obvious that some training sample are making training process worse. The question is – how to find wich sample should be removed from training? Kind of removing outliers. Authors define “training value” for each sample of binary classifier.

http://arxiv.org/pdf/1311.7080

*Finding sparse solutions of systems of polynomial equations via group-sparsity optimization*

Finding sparse solution of polynomial system with lifting method.

I still not completely understand why quite weak structure constraint is enough for found approximation to be solution with high probability. It would be obviously precise for binary 0-1 solution, but why for general sparse?

http://arxiv.org/abs/1311.5871

*Semi-Supervised Sparse Coding*

Big dataset with small amount of labeled samples – what to do? Use unlabeled samples for sparse representation. And train labeled samples in sparse representation.

http://arxiv.org/abs/1311.6834

From the same author, similar theme – Cross-Domain Sparse Coding

Two domain training – use cross domain data representation to map all the samples from both source and target domains to a data representation space with a common distribution across domains.

http://arxiv.org/abs/1311.7080

*Robust Low-rank Tensor Recovery: Models and Algorithms*

More of tensor decomposition with trace norm

http://arxiv.org/abs/1311.6182

*Complexity of Inexact Proximal Newton methods*

Application of Proximal Newton (BFGS) to subset of coordinates each step – active set coordinate descent.

http://arxiv.org/pdf/1311.6547

*Computational Complexity of Smooth Differential Equations*

Polynomial-memory complexity of ordinary differential equations.

http://arxiv.org/abs/1311.5414

**–arxiv (2)–**

**Deep Learning**

*Visualizing and Understanding Convolutional Neural Networks*

This is exploration of Alex Krizhevsky’s ConvNet

( https://code.google.com/p/cuda-convnet/ )

using “deconvnet” approach – using deconvolution on output of each layer and visualizing it. Results looks interesting – strting from level 3 it’s something like thersholded edge enchantment, or sketch. Also there are evidences supporting “learn once use everywhere” approach – convnet trained on ImageNet is also effective on other datasets

http://arxiv.org/abs/1311.2901

*Unsupervised Learning of Invariant Representations in Hierarchical Architectures*

Another paper on why and how deep learning works.

Attempt to build theoretical framework for invariant features in deep learning. Interesting result – Gabor wavelets are optimal filters for simultaneous scale and translation invariance. Relations to sparsity and scattering transform

http://arxiv.org/abs/1311.4158

**Computer Vision**

*An Experimental Comparison of Trust Region and Level Sets*

Trust regions method for energy-based segmentation.

Trust region is one of the most important tools in optimization, especially non-convex.

http://en.wikipedia.org/wiki/Trust_region

http://arxiv.org/abs/1311.2102

*Blind Deconvolution with Re-weighted Sparsity Promotion*

Using reweighted L2 norm for sparsity in blind deconvolution

http://arxiv.org/abs/1311.4029

**Optimization**

*Online universal gradient methods*

about Nesterov’s universal gradient method (

http://www.optimization-online.org/DB_FILE/2013/04/3833.pdf )

It use Bregman distance and related to ADMM.

The paper is application of universal gradient method to online learning and give bound on regret function.

http://arxiv.org/abs/1311.3832

**CS**

*A Component Lasso*

Approximate covariance matrix with block-diagonal matrix and apply Lasso to each block separately

http://arxiv.org/abs/1311.4472

_FuSSO: Functional Shrinkage and Selection Operator

Lasso in functional space with some orthogonal basis_

http://arxiv.org/abs/1311.2234

*Non-Convex Compressed Sensing Using Partial Support Information*

More of Lp norm for sparse recovery. Reweighted this time.

http://arxiv.org/abs/1311.3773

**–arxiv (1)–**

**Optimization, CS**

Scalable Frames and Convex Geometry

Frame theory is a basis(pun intended) of wavelets theory, compressed sening and overcomplete dictionaries in ML

http://en.wikipedia.org/wiki/Frame_of_a_vector_space

Here is a discussion how to make “tight frame”

http://en.wikipedia.org/wiki/Frame_of_a_vector_space#Tight_frames

from an ordinary frame by scaling *m* of its components

Interesting geometric insight provided – to do it *m*components of frame should make “blunt cone”

http://en.wikipedia.org/wiki/Convex_cone#Blunt_and_pointed_cones

http://arxiv.org/abs/1310.8107

Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization

Some bounds for convergence of dictionary learning. Converge id initial error is O(1/s^2), s- sparcity level

http://arxiv.org/abs/1310.7991

**Robust estimators**

Robustness of ℓ1 minimization against sparse outliers and its implications in Statistics and Signal Recovery

This is another exploration of L1 estimator. It happens (contrary to common sense) that L1 is not especially robust from “breakdown point” point of view if there is no constraint of noise. However it practical usefulness can be explained that it’s very robust to sparse noise

http://arxiv.org/abs/1310.7637

**Numerical**

Local Fourier Analysis of Multigrid Methods with Polynomial Smoothers and Aggressive coarsening

Overrelaxaction with Chebyshev weights on the fine grid, with convergence analysis.

http://arxiv.org/abs/1310.8385

## Finds in arxiv, october

This is duplication of my ongoing G+ series of post on interesting for me papers in arxiv. Older posts are not here but in my G+ thread.

Finds in #arxiv :

*Optimization, numerical & convex, ML*

The Linearized Bregman Method via Split Feasibility Problems: Analysis and Generalizations

Reformulation of Split Bregman/ ADMM as split feasibility problem and algorithm/convergence for generalized split feasibility by Bregman projection. This general formulation include both Split Bregman and Kaczmarz (my comment – randomized Kaczmarz seems could be here too)

http://arxiv.org/abs/1309.2094

Stochastic gradient descent and the randomized Kaczmarz algorithm

Hybrid of randomized Kaczmarz and stochastic gradient descent – into my “to read” pile

http://arxiv.org/abs/1310.5715

Trust–Region Problems with Linear Inequality Constraints: Exact SDP Relaxation, Global Optimality and Robust Optimization

“Extended” trust region for linear inequalities constrains

http://arxiv.org/abs/1309.3000

Conic Geometric Programming

Unifing framwork for conic and geometric programming

http://arxiv.org/abs/1310.0899

http://en.wikipedia.org/wiki/Geometric_programming

http://en.wikipedia.org/wiki/Conic_programming

Gauge optimization, duality, and applications

Another big paper about different, not Lagrange duality, introduced by Freund (1987)

http://arxiv.org/abs/1310.2639

Color Bregman TV

mu parameters in split bregman made adaptive, to exploit coherence of edges in different color channels

http://arxiv.org/abs/1310.3146

Iteration Complexity Analysis of Block Coordinate Descent Methods

Some convergence analysis for BCD and projected gradient BCD

http://arxiv.org/abs/1310.6957

Successive Nonnegative Projection Algorithm for Robust Nonnegative Blind Source Separation

Nonnegative matrix factorization

http://arxiv.org/abs/1310.7529

Scaling SVM and Least Absolute Deviations via Exact Data Reduction

SVN for large-scale problems

http://arxiv.org/abs/1310.7048

Image Restoration using Total Variation with Overlapping Group Sparsity

While title is promising I have doubt about that paper. The method authors suggest is equivalent to adding averaging filter to TV-L1 under L1 norm. There is no comparison to just applying TV-L1 and smoothing filter interchangeably.The method author suggest is very costly, and using median filter instead of averaging would cost the same while obviously more robust.

http://arxiv.org/abs/1310.3447

*Deep learning*

Deep and Wide Multiscale Recursive Networks for Robust Image Labeling

_Open source_ matlab/c package coming soon(not yet available)

http://arxiv.org/abs/1310.0354

Improvements to deep convolutional neural networks for LVCSR

convolutional networks, droput for speech recognition,

http://arxiv.org/abs/1309.1501v1

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

Already discussed on G+ – open source framework in “learn one use everywhere” stile. Learning done off-line on GPU using ConvNet, and recognition is online in pure python.

http://arxiv.org/abs/1310.1531

Statistical mechanics of complex neural systems and high dimensional data

Big textbook-like overview paper on statistical mechanics of learning. I’ve put it in my “to read” pile.

http://arxiv.org/abs/1301.7115

Randomized co-training: from cortical neurons to machine learning and back again

“Selectron” instead of perception – neurons are “specializing” with weights.

http://arxiv.org/abs/1310.6536

Provable Bounds for Learning Some Deep Representations

http://arxiv.org/abs/1310.6343

Citation:”The current paper presents both an interesting family of denoising autoencoders as

*Computer vision*

Online Unsupervised Feature Learning for Visual Tracking

Sparse representation, overcomplete dictionary

http://arxiv.org/abs/1310.1690

From Shading to Local Shape

Shape restoration from local shading – could be very useful in low-feature environment.

http://arxiv.org/abs/1310.2916

Fast 3D Salient Region Detection in Medical Images using GPUs

Finding interest point in 3D images

http://arxiv.org/abs/1310.6736

Object Recognition System Design in Computer Vision: a Universal Approach

Grid-based universal framework for object detection/classification

http://arxiv.org/abs/1310.7170

Gaming :)

Lyapunov-based Low-thrust Optimal Orbit Transfer: An approach in Cartesian coordinates

For space sim enthusiast :)

http://arxiv.org/abs/1310.4201

## Effectivness of compressed sensing in image processing and other stuff

I seldom post in the this blog now, mostly because I’m positing on twitter and G+ a lot lately. I still haven’t figured out which post should go where – blog, G+ or twitter, so it’s kind of chaotic for now.

What of interest is going on: There are two paper on the CVPR11 which claim that compressed sensing(sparse recovery) is not applicable to some of the most important computer vision tasks:

Is face recognition really a Compressive Sensing problem?

Are Sparse Representations Really Relevant for Image Classification?

Both paper claim that space of the natural images(or their important subsets) are not really sparse.

Those claims however dont’t square with claim of high effectiveness of compact signature of Random Ferns.

Could both of those be true? In my opinion – yes. Difference of two approaches is that first two paper assumed *explicit* sparsity – that is they enforced sparsity on the feature vector. Compressed signature approach used *implicit* sparsity – feature vector underling the signature is assumed sparse but is not explicitly reconstructed. Why compressed signature is working while explicit approach didn’t? That could be the case if image space is sparse in the different coordinate system – that is here one is dealing with the *union of subspaces*. Assumption not of the simple sparsity, but of the union of subspaces is called blind compressed sensing.

Now if we look at the space of the natural images it’s easy to see why it is low dimensional. Natural image is the image of some scene, an that scene has limited number of moving object. So dimension of space images of the scene is approximately the sum of degree of freedom of the objects(and camera) of the scene, plus effects of occlusions, illumination and noise. Now if the add strong enough random error to the scene, the image is stop to be the natural image(that is image of any scene). That mean manifold of the images of the scene is isolated – there is no natural images in it’s neighborhood. That hint that up to some error the space of the natural images is at least is *the union of isolated low-dimensional manifolds*. The union of mainfolds is obviously is more complex structure than the union of subspace, but methods of blind compressed sensing could be applicable to it too. Of cause to think about union of manifolds could be necessary only if the space of images is not union of subspace, which is obviously preferable case

## L1, robust statisrics and compressed sensing

Anyone who did 3D reconstruction and camera pose estimation know, that outliers one of the main, if not the main problem there. There are several ways to deal with outliers, RANSAC and trimming are probably most common. Both of them has major drawback though – they based on the initial error estimation. But for example, in pose estimation, situation where the wrong values have initial error order of magnitude less then correct values is quite common. RANSAC and trimming would make situation worse in that case. What really work there are robust estimators, which is, in many cases just statisticians name for reweighted iterative least square.

Now why and how robust estimators works is really interesting. Basically robust estimator is a maximum likelihood estimator for non-normal distribution, that is distribution with “thick” tail. One of the simplest of robust estimators is L1, which correspond to Laplace distribution. Laplace distribution descend more slow than normal distribution, so it’s obviously more robust. And now the really interesting things start. L1 estimator is the fundamental concept of compressed sensing. And compressed sensing is all about finding “sparse” solution, that is solution which are mostly zeros, but with few components are quite big. And what outliers are? They are exactly “sparse” big components of error vector. If we have linear system with noise in right part, and the noise is dominated by small number of really big outliers then, as Terence Tao pointed out, we can multiply both part of the system on the appropriate matrix and get a sparse system of equations for outliers. That would be a classical compressed sensing problem, for which L1 minimization works perfectly. Recently it was proven, using compressed sensing inspired technique, that L1 estimator for system with outliers really behave similar to L1 minimizer for sparse solutions – it has stable breaking point(Sharon, Wright, Ma, Minimum Sum of Distances Estimator: Robustness and Stability).

That make me think about things I really don’t understand – what is connection between other, redescending estimators and L1 estimator? In practical applications redescending estimators often works better than L1. But redescending estimators practically is not much different form trimming. Does it mean that they are just convenient shortcuts, and in general case L1 is more robust?(One drawback of redescending estimator that it can has multiple local minima) Which assumptions about outliers we should do to to choose most appropriate estimator? I would like to read some theory of redescending estimators, their breaking point and especially their relation to L1, but so far not sure even where to start…

(PPS In this post I talk more about Mueller work on redescending M-estimators which partially answer the question)

PS Another interesting (for me that is, for someone else it could be trivial) problem is dimensionality. For 1-dimensional variable L1 and distance is the same. For vector they are not. So for vector-valued variables estimator “minimum sum of distance” estimator is not the same as L1 estimator. Would be L1 more robust than “minimum sum of distance” for vectors? Compressed sensing logic say that it should, but L1 estimator is anisotropic, it depend on coordinate system. That is for L1 to be effective the outliers should be aligned with coordinate system. Here there is the difference between overall dimensionality of the problem -number of samples and “micro” dimensionality – dimensionality of each sample. I’ll try to sort it out later.