I have seen an excellent wlakthrough on building Alex Krizhevsky’s cuda-convnet for windows, but difference in configuration and installed packages could be tiresome. So here is complete build of convnet for windows 64:
It require having CUDA compute capability 2.0 or better GPU of cause, Windows 64bit, Visual Studio 64bits and Python 64bit with NumPy installed. The rest of libs and dlls are precomplied. In building it I’ve followed instructions by Yalong Bai (Wyvernbai) from http://www.asiteof.me/archives/50.
Read Readme.md before installation – you may (or may not) require PYTHONPATH environmental variable set.
On side note I’ve used WinPython for both libraries and running the package. WinPython seems a very nice package, which include Spyder IDE. I have some problems with Spyder though – sometimes it behave erratically during debugging/running the code. Could be my inexperience with Spyder though. Another nice package – PythonXY – regretfully can not be used – it has only 32 bit version and will not compile with/run 64 bit modules.
This demo was inspired by Escapist article “I hate magic” by Robert Rath ( http://www.escapistmagazine.com/artic… ) and “continuous game of life” (http://arxiv.org/abs/1111.1567 ) by Stephan Rafler
This is attempt to show how GPGPU could be used to model magic as different physics.
The demo run at 70 fps at laptop with GF GTX 670M. With some tradeoffs it can be made run much faster
require CUDA GPGPU with at least 2.0 Cuda compute capability (practically all modern cards, GF GTX 550 or better)
“Reduce” is one of the most useful samples in NVIDIA CUDA SDK. It’s implementation of highly optimized cuda algorithm for some of the elements of the array of the arbitrary length. It’s hardly possible to make anything better and generic enough with existing GPGPU architecture (if anyone know something as generic but considerably more efficient I’d like to know too). One of the big plus of the reduce algorithm is that it can work for any binary commutative associative operation – like min, max, multiply etc. And NVIDIA sample provide this ability – it’s implemented as reduce on template class, so all one have to do is implement class with overload of addition and assignment operations.
However there is one obstacle – it’s a __volatile__ qualifier in the code. Simple overload of “=” “+=” and “+” operations in class LSum cause compiler error like
error: no operator “+” matches these operands
1> operand types are: LSum + volatile LSum
The answer is add __volatile__ to all class operation, but there is the trick here:
for “=” just
volatile LSum& operator =(volatile LSum &rhs)
is not enough. You should add volatile to the end too, to specify not only input and output, but function itself as volatile.
At the end class looks like:
__device__ LSum& operator+=(volatile LSum &rhs)
__device__ LSum operator+(volatile LSum &rhs)
LSum res = *this;
res += rhs;
__device__ LSum& operator =(const float &LSum)
__device__ volatile LSum& operator =(volatile LSum &rhs) volatile
Split Bregamn method is one of the most used method for solving convex optimization problem with non-smooth regularization. Mostly it used for regularization, and here I want to talk about TV-L1 and similar regularizers ( like )
For simplest possible one-dimentional denoising case
Split Bregman iteration looks like
There is a lot of literature about choosing parameter from (1) for denosing. Methods include Morozov discrepancy principle, L-curve and more – that’s just parameter of Tikhonov regularization.
There is a lot less metods for choosing for Split Bregman.
Theoretically shouldn’t be very important – if Split Bregman converge solution is not depending on . In practice of cause convergence speed and stability depend on a lot. Here I’ll point out some simple consideration which may help to choose
Assume for simplicity that u is twice differentiable almoste everythere, the (1) become
We know if Split Bbregman converge
that mean , here exteranl derivative is a weak derivative.
For the solution b is therefore locally constant, and we even know what constant it is.
For we have
For converged solution
everythere where is not zero, with possible exeption of measure zero set.
Now returning to choice of for Split Bregman iterations.
From (4) we see that for small values b , b grow with step until it reach it’s maximum absolute value
Also if the b is “wrong”, if we want for it to switch value in one iteration should be more than
That give us lower boundary for :
For most of x
What happens if on some interval b reach for two consequtive iterations?
Then on the next iteration form (5) and (3) on that interval
See taht with big enuogh sign of stabilizing with high probaility and we could be close to true solution. The “could” in the last sentence is there because we are inside the interval, and if the values of u on the ends of interval are wrong b “flipping” can propagate inside our interval.
It’s more difficult to estimate upper boundary for .
Obviously for more easy solution of (2) or (5) shouldn’t be too big, so that eigenvalues of operator woudn’t be too close to 1. Because solutions of (2) are inexact in Split Breagman we obviously want having bigger eigenvalues, so that single (or small number of) iteration could suppress error good enough for (2) subproblem.
So in conclusion (and my limited experience) if you can estimate for solution could be a good value to start testing.
Thanks to Igor Carron for pointing me to this workshop – Algebraic Topology and Machine Learning . There is very interesting paper there Persistent Homological Structures in Compressed Sensing and Sparse Likelihood by Moo K. Chung, Hyekyung Lee and Matthew Arnold. The paper is very comprehensive and require only minimal understanding of some algebraic topology concepts (which is exactly where I’m in realation to algebraic topology). Basically it’s application of topological data analysis to compressive sensing. They use such thing as persistent homology and “barcodes”. Before, persistent homology and barcodes were used for such things as extracting solid structure from noisiy point cloud. Barcode is stable to noise dependence of some topological invariants on some parameter. In case of point cloud parameter is the radius of the ball around each point. As radius go from very big to zero topology of union of balls change, and those changes of topology make barcode. Because barcode is stable topological invariant learning barcode is the same as learning topology of solid structure underlying point cloud.
In the paper authors using graphical lasso (glasso) with regularizer to find interdependency between set of sampled variables. However if consider parameter of regularizer as a kind of radius of ball this problem aquire persistent homology and barcode. The correlation matrix is thresholded by and become adjacency matrix of some graph. Barcode is now dependence of topology of that graph on parameter . What is especially spectacular is that to calculate barcode no glasso iteration are needed – barcode obtained by simple thresholding of correlation matrix. Thus barcode easily found and with it topology of correlations of variables. Well, at least that is how I understood the paper.
PS Using this approach for total variation denoising barcode would include dependance of size function from smoothing parameter.
If you are an unfortunate person who have to use Windows 7 a lot (like me) undeletable files happen to you from time to time. It could be result of disk failure, or interrupted boot, or just for no reason at all.
The problem is, sometimes they can’t be deleted even by Administrator. To fix this Administrator have have to “take ownership” of the file in question.
It goes like this:
1. run from the command line
This step is necessary because file permissions could be corrupted
2. login as Administrator.
3. from the command line
takeown /f full_directory_and_path_name
4. from the command line
icacls full_directory_and_path_name /GRANT ADMINISTRATORS:F
5.Now Administrator can delete file. If even that is not helping you can try delete from Safe mode.
This post is inspired by Extremal Principles in Classical, Statistical and Quantum Mechanics in Azimuth blog.
Total Variation used a lot in image processing. Image denoising, optical flow, depth maps processing. The standard form of Total Variation f or norm is minimizing “energy” of the form
(I’m talking about Total Variaton- for now, not ) over all functions
In case of image denoising it would be
where is original image and is denoised image
Part is called “fidelity term” and is “regularizer”
Regularizer part is to provide smoothness of solution and fidelity term is to force smooth solution to resemble original image (that is in case of image denoising)
Now if we return to classical Action, movement of the point is defined by the minimum of functional
, over trajectories where is kinetic energy and is potential energy, or
One-dimensional total variation for image denoising is the same as classical mechanics of the particle, with potential energy defined by iamge and smoothness of denoised image as kinetic energy! For optical flow potential energy is differences between tranformed first image and the second
and kinetic energy is the smoothness of the optical flow.
Of cause the strict equality hold only for one-dimentional image and , and potential energy is quite strange – it depend not on coordinate but on velocity, like some kind of friction.
While it hold some practical meaning, most of practical task have two or more dimensional image and or regulariser. So in term of classical mechanics we have movement in multidimensional time with non-classical kinetic energy
which has uncanny resemblance to Lagrangian of relativistic particle
So total variation in image processing is equivalent to physics of non-classical movement with multidimensional time, in the field with potential energy defined by image. I have no idea what does it signify, but it sounds cool :) . Holographic principle? May be crowd from Azimuth or n-category cafe will give some explanation eventually…
And another, related question: regularizer in Total Variation. There is inherent connection between regularizers and Bayesian priors. What TV-L1 regularizer mean from Bayesian statistics point of view?
PS I’m posting mostly on my google plus now, so this blog is a small part of my posts.
Something I did for Samsung (kernel of tracker). Biggest improvement in SARI 1.5 is the sensors fusion, which allow for a lot more robust tracking.
Here is example of run-time localization and mapping with SARI 1.5:
This is the AR EdiBear game (free in Samsung apps store)
Interior point method for nonlinear optimization often considered as complex, or highly nontrivial etc. The fact is, that for “simple” nonlinear optimization it’s quite simple, manageable and can even be explained in #3tweets. For those not familiar with it there is a simple introduction to it in wikipedia, which in turn follows an excellent paper by Margaret H. Wright.
Now about “if all you have is a hammer, everything looks like a nail”. Some of applications of interior point method could be quite unexpected.
Everyone who worked with Levenberg-Marquardt minimization algorithm know how much pain is the choice of the small parameter . Levenberg-Marquardt can also be seen as modification of Gauss-newton with a trust region. The of Levenberg-Marquardt do correspond to the trust region radius, but that dependence is highly complex and is difficult to estimate. You want trust region of the radius r, but what should be avlue of ? There is no easy answer to that question; there are some complex methods, or there is a testing with subdivision, which is what the original Levenberg-Marquardt implement.
Interior point can help here.
If we choose shape of trust region for Gauss-Newton as hypercube or simplex or like, we can formulate it as set of L1 norm inequality constrains. And that is the domain of interior point method! For hypercube the resulting equations looks especially nice
W – hessian, I – identity, diag – diagonal
That is a banded arrowhead matrix, and for it Cholesky decomposition cost insignificantly more than decomposition of original W. The matrix is not positive definite – Cholesky without square root should be used.
Now there is a temptation to use single constrain instead of set of constrain , but that will not work. should have to be linearized to be tractable, but it’s a second order condition – it’s linear part is zero, so linearization doesn’t constrain anything.
The same method could be used whenever we have to put constrain on the value of Gauss-Newton update, and shape of constrain in not important (or polygonal)
Now last touch – Interior point method has small parameter of it’s own. It’s called usually . In the “normal” method there is a nice rule for it update – take it as (in the notation from wikipedia article – is a value of constraint, is a value of slack variable at the previous iteration) That rule usually explicitly stated in the articles about Interior Point Method(IPM) for Linear Programming, but omitted (as obvious probably) in the papers about IPM for nonlinear optimization
In our case (IPM for trust region) we don’t need update at all – we move boundary of the region with each iteration, so each is an initial value. Have to remember, is not a size of trust region, but strength of it’s enforcement.
I seldom post in the this blog now, mostly because I’m positing on twitter and G+ a lot lately. I still haven’t figured out which post should go where – blog, G+ or twitter, so it’s kind of chaotic for now.
What of interest is going on: There are two paper on the CVPR11 which claim that compressed sensing(sparse recovery) is not applicable to some of the most important computer vision tasks:
Is face recognition really a Compressive Sensing problem?
Are Sparse Representations Really Relevant for Image Classification?
Both paper claim that space of the natural images(or their important subsets) are not really sparse.
Those claims however dont’t square with claim of high effectiveness of compact signature of Random Ferns.
Could both of those be true? In my opinion – yes. Difference of two approaches is that first two paper assumed explicit sparsity – that is they enforced sparsity on the feature vector. Compressed signature approach used implicit sparsity – feature vector underling the signature is assumed sparse but is not explicitly reconstructed. Why compressed signature is working while explicit approach didn’t? That could be the case if image space is sparse in the different coordinate system – that is here one is dealing with the union of subspaces. Assumption not of the simple sparsity, but of the union of subspaces is called blind compressed sensing.
Now if we look at the space of the natural images it’s easy to see why it is low dimensional. Natural image is the image of some scene, an that scene has limited number of moving object. So dimension of space images of the scene is approximately the sum of degree of freedom of the objects(and camera) of the scene, plus effects of occlusions, illumination and noise. Now if the add strong enough random error to the scene, the image is stop to be the natural image(that is image of any scene). That mean manifold of the images of the scene is isolated – there is no natural images in it’s neighborhood. That hint that up to some error the space of the natural images is at least is the union of isolated low-dimensional manifolds. The union of mainfolds is obviously is more complex structure than the union of subspace, but methods of blind compressed sensing could be applicable to it too. Of cause to think about union of manifolds could be necessary only if the space of images is not union of subspace, which is obviously preferable case