I seldom post in the this blog now, mostly because I’m positing on twitter and G+ a lot lately. I still haven’t figured out which post should go where – blog, G+ or twitter, so it’s kind of chaotic for now.
What of interest is going on: There are two paper on the CVPR11 which claim that compressed sensing(sparse recovery) is not applicable to some of the most important computer vision tasks:
Is face recognition really a Compressive Sensing problem?
Are Sparse Representations Really Relevant for Image Classification?
Both paper claim that space of the natural images(or their important subsets) are not really sparse.
Those claims however dont’t square with claim of high effectiveness of compact signature of Random Ferns.
Could both of those be true? In my opinion – yes. Difference of two approaches is that first two paper assumed explicit sparsity – that is they enforced sparsity on the feature vector. Compressed signature approach used implicit sparsity – feature vector underling the signature is assumed sparse but is not explicitly reconstructed. Why compressed signature is working while explicit approach didn’t? That could be the case if image space is sparse in the different coordinate system – that is here one is dealing with the union of subspaces. Assumption not of the simple sparsity, but of the union of subspaces is called blind compressed sensing.
Now if we look at the space of the natural images it’s easy to see why it is low dimensional. Natural image is the image of some scene, an that scene has limited number of moving object. So dimension of space images of the scene is approximately the sum of degree of freedom of the objects(and camera) of the scene, plus effects of occlusions, illumination and noise. Now if the add strong enough random error to the scene, the image is stop to be the natural image(that is image of any scene). That mean manifold of the images of the scene is isolated – there is no natural images in it’s neighborhood. That hint that up to some error the space of the natural images is at least is the union of isolated low-dimensional manifolds. The union of mainfolds is obviously is more complex structure than the union of subspace, but methods of blind compressed sensing could be applicable to it too. Of cause to think about union of manifolds could be necessary only if the space of images is not union of subspace, which is obviously preferable case
Interesting product – camera for computer vision applications, with open sourced DSP
“The entire camera (hardware as well as software) is open source. It features a 752×480 pixel CMOS sensor, 64MB of SDRAM and 4MB of flash, Ethernet and div. IOs.
The camera runs a uClinux and comes with an image processing framework.”
Datasheet is here
Very interesting article in Wired Recipe for Disaster: The Formula That Killed Wall Street . I’m not a statistician, but I’ll try to explain it. The gist of the article is that in the heart of the current financial crisis is the David X. Li formula, which use “Gaussian copula function” for risk estimation. The idea of formula is that if we have to estimate joint probability of two random events, it could be done with simple formula, which use only probability distributions of each event as if they were independent and a single parameter – statistical correlation. So what bankers did – instead of looking into relationships and connections between events they just did calculate one single statistical parameter and used it for risk estimation. Even more – they applied the same formula to the results of those relatively simple calculations and build pyramids of estimations, each next step applying the same simple formula to results of the previous step. As a result, an extremely complex behavior was reduced to the simple linear model, which had little in common with reality.
And now – the illustration from wiki, what exactly this single parameter – correlation is:
Here are several two-variable distributions and their correlation coefficients. It could be seen that for linear relationships correlation capture dependence of variables perfectly (middle raw). For upper row – normal distributions – it capture the essence of dependency. We can say something about other variable if we know one variable and correlation in that case. For complex shapes – lower row – correlation is zero for each. Each of the lower shapes will be represented as the upper central shape (fuzzy ball) with correlation. Correlation capture nil information about how one variable depend on another for the lower shapes. Correlation allow representation of any shape only as fuzzy ellipse. Li’s formula reduce dimensionality. The thing is, dimensionality – topological property, and you don’t mess with topological properties easily. Imagine bankers using fuzzy ball instead of ring for risk estimation…
Now to the image processing. Most of feature detection in image processing is done for grayscale image. Original image is usually RGB, but before features extraction it converted to grayscale.
However the original image is colored, why not to use colors for feature detection ? For example detect features in each color channel separately?
The thing is, the pictures in each color channel are very similar.
The extraction of blobs in each channel in most cases will triple the job without gaining of significant new information – all the channels will give about the same blobs.
Nevertheless it’s obvious, there is some nontrivial information about the image, encoded in colors.
Why blob detection for each color don’t give access to it ?
The reason is the same as for current financial crisis – dimensionality. Treating each color channel separately we replace five-dimensional RGB+coordinates space with three three-dimensional color+coordinates spaces. Relationships between color channels are lost. Topology of color structure is lost.
To actually use color information, statistical relationships between colors of the image should be explored – something like three dimensional color bins histogram, essentially converting image from RGB to indexed color.