Mirror Image

Mostly AR and Stuff

November finds in #arxiv and NIPS 2013

This is “find in arxiv” reposts form my G+ stream for November.

NIPS 2013

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

Stochastic gradient (SGD) is the major tool for Deep Learning. However if you look at the plot of cost function over iteration for   SGD you will see that after quite fast descent it becoming extremely slow, and error decrease could even become non-monotonous.  Author explain by necessity of trade of between the step size and variance of random factor – more precision require smaller variance but that mean smaller descent step and slower convergence. “Predictive variance” author suggest to mitigate problem is the same old “adding back the noise” trick, used for example in Split Bregman. Worth reading IMHO.

http://papers.nips.cc/paper/4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction

Predicting Parameters in Deep Learning
Output of the first layer of ConvNet is quite smooth, and that could be used for dimensionality reduction, using some dictionary, learned or fixed(just some simple kernel). For ConvNet predicting 75% of parameters with fixed dictionary have negligible effect on accuracy.
http://papers.nips.cc/paper/5025-predicting-parameters-in-deep-learning

Learning a Deep Compact Image Representation for Visual Tracking
Application of ADMM (Alternating Direction Method of Multipliers, of which Split Bregman again one of the prominent examples) to denoising autoencoder with sparsity.
http://papers.nips.cc/paper/5192-learning-a-deep-compact-image-representation-for-visual-tracking

Deep Neural Networks for Object Detection
People from Google are playing with Alex Krizhevsky’s ConvNet
http://papers.nips.cc/paper/5207-deep-neural-networks-for-object-detection

–arxiv (last)–

Are all training examples equally valuable?
It’s intuitively obvious that some training sample are making training process worse. The question is – how to find wich sample should be removed from training? Kind of removing outliers. Authors define “training value” for each sample of binary classifier.
http://arxiv.org/pdf/1311.7080

Finding sparse solutions of systems of polynomial equations via group-sparsity optimization
Finding sparse solution of polynomial system with lifting method.
I still not completely understand why quite weak structure constraint is enough for found approximation to be solution with high probability. It would be obviously precise  for binary 0-1 solution, but why for general sparse?
http://arxiv.org/abs/1311.5871

Semi-Supervised Sparse Coding
Big dataset with small amount of labeled samples – what to do? Use unlabeled samples for sparse representation. And train labeled samples in sparse representation.
http://arxiv.org/abs/1311.6834
From the same author, similar theme – Cross-Domain Sparse Coding
Two domain training – use cross domain data representation to map all the samples from both source and target domains to a data representation space with a common distribution across domains.
http://arxiv.org/abs/1311.7080

Robust Low-rank Tensor Recovery: Models and Algorithms
More of tensor decomposition with trace norm
http://arxiv.org/abs/1311.6182

Complexity of Inexact Proximal Newton methods
Application of Proximal Newton (BFGS) to subset of coordinates each step – active set coordinate descent.
http://arxiv.org/pdf/1311.6547

Computational Complexity of Smooth Differential Equations
Polynomial-memory complexity of ordinary differential equations.
http://arxiv.org/abs/1311.5414

–arxiv (2)–

Deep Learning

Visualizing and Understanding Convolutional Neural Networks
This is exploration of Alex Krizhevsky’s  ConvNet
https://code.google.com/p/cuda-convnet/ )
using “deconvnet” approach – using deconvolution on output of each layer and visualizing it. Results looks interesting – strting from level 3 it’s something like thersholded edge enchantment, or sketch. Also there are evidences supporting “learn once use everywhere” approach – convnet trained on ImageNet is also effective on other datasets
http://arxiv.org/abs/1311.2901

Unsupervised Learning of Invariant Representations in Hierarchical Architectures
Another paper on why and how deep learning works.
Attempt to build theoretical framework for invariant features in deep learning. Interesting result – Gabor wavelets are optimal filters for simultaneous scale and translation invariance. Relations to sparsity and scattering transform
http://arxiv.org/abs/1311.4158

Computer Vision
An Experimental Comparison of Trust Region and Level Sets
Trust regions method for energy-based segmentation.
Trust region is one of the most important tools in optimization, especially non-convex.
http://en.wikipedia.org/wiki/Trust_region
http://arxiv.org/abs/1311.2102

Blind Deconvolution with Re-weighted Sparsity Promotion
Using reweighted L2 norm for sparsity in blind deconvolution
http://arxiv.org/abs/1311.4029

Optimization

Online universal gradient methods
about  Nesterov’s universal gradient method (
http://www.optimization-online.org/DB_FILE/2013/04/3833.pdf )
It use Bregman distance and related to ADMM.
The paper is application of universal gradient method  to online learning and give bound on regret function.
http://arxiv.org/abs/1311.3832

CS
A Component Lasso
Approximate covariance matrix with block-diagonal matrix and apply Lasso to each block separately
http://arxiv.org/abs/1311.4472

_FuSSO: Functional Shrinkage and Selection Operator
Lasso in functional space with some orthogonal basis_
http://arxiv.org/abs/1311.2234

Non-Convex Compressed Sensing Using Partial Support Information
More of Lp norm for sparse recovery. Reweighted this time.
http://arxiv.org/abs/1311.3773

–arxiv  (1)–

Optimization, CS
Scalable Frames and Convex Geometry
Frame theory is a basis(pun intended) of wavelets theory, compressed sening and overcomplete dictionaries in ML
http://en.wikipedia.org/wiki/Frame_of_a_vector_space
Here is a discussion how to make “tight frame”
http://en.wikipedia.org/wiki/Frame_of_a_vector_space#Tight_frames
from an ordinary frame by scaling m of its components
Interesting geometric insight provided – to do it mcomponents of frame should make “blunt cone”
http://en.wikipedia.org/wiki/Convex_cone#Blunt_and_pointed_cones
http://arxiv.org/abs/1310.8107

Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization
Some bounds for convergence of dictionary learning. Converge id initial error is O(1/s^2), s- sparcity level
http://arxiv.org/abs/1310.7991

Robust estimators
Robustness of ℓ1 minimization against sparse outliers and its implications in Statistics and Signal Recovery
This is another exploration of L1 estimator. It happens (contrary to common sense) that L1 is not especially robust from “breakdown point” point of view if there is no constraint of noise. However it practical usefulness can be explained that it’s very robust to sparse noise
http://arxiv.org/abs/1310.7637

Numerical
Local Fourier Analysis of Multigrid Methods with Polynomial Smoothers and Aggressive coarsening
Overrelaxaction with Chebyshev weights on the fine grid, with convergence analysis.
http://arxiv.org/abs/1310.8385

1, December, 2013 - Posted by | arxiv | , , , , ,

Sorry, the comment form is closed at this time.