Mirror Image

Mostly AR and Stuff

Compressive Sensing and Computer Vision

Thanks to Igor Carron for pointing out this video lecture
Compressive Sensing for Computer Vision: Hype vs Hope
It start with comprehensible explanation of what compressive sensing is about (BTW wiki article on compressive sensing is wholly inadequate).
Basically it’s about imagining the lower-dimensional signal(image) as projection by rectangular matrix from mostly zero high-dimensional vector. It happens that this sparse high-dimensional vector can be restored if the matrix is almoste orthonormal (Restricted Isometry Property). Discrete Fourier Transform and random matrices have that property.
This sparse vector could be considered as classification space for original signal. So application of Compressive Sensing to Computer Vision is mostly about classification or recognition. As methods used by CS are convex and linear programming those are not run-time methods, and would not help much in real-time tracking. There is CS-inspired advise at the end of the lecture, about trying to replace L^{2} norm optimization with L^{1} norm. That could be actually helpful in some cases. If L^{1} approximated as iteratively reweighted L^{2} it’s essentially the same as robustification of least square method.

7, December, 2009 Posted by mirror2image | Coding AR | , , | 3 Comments

Testing a new descriptor.

Trying a new descriptor, inspired by SURF and SIFT. Want to use gradient instead of Haar transforms of intensity, but with less dimensionality than SURF. Also don’t need rotation/scale invariance, because using incremental tracking.

20, November, 2009 Posted by mirror2image | Coding AR | , , , , , | 2 Comments

What I would say to Nokia about mobile AR (if it would listen)

#augmentedreality
I have been struck off the list of the Nokia Augmented Reality co-creation session, so here is a gist of what I was intending to say about AR-friendly mobile devices.
I will not repeat obvious here (requirements for CPU, FPU, RAM etc.) but concentrate on things which are often missed.
I. Hardware side
1. Battery life is the most important thing here. AR applications are eating battery extremely fast – full CPU load, memory access, working camera and on top of it wireless data access, GPS and e-compass.
It’s not realistic to expect dramatic improvement in the battery life in near future, though fuel cells and air-fueled batteries give some hope. If one think short term the dual battery is the most realistic solution. AR-capable devices tend to be quite heavy and not quite slim anyway, so second battery will not make dramatic difference (iPhone could be exception here).
Now how to make maximum out of it? Make batteries hot-swappable with separate slots and provide separate battery charger. If user indoor he/she can remove empty battery and put it on charge while device is running on the second.
2. Heating. Up until now no one was paying attention to the heating of mobile devices, mostly because CPU-heavy apps are very few now (may be only 3d games). AR application produce even more heat than 3d game and device could become quite hot. So heatsinks and heatpumps are on the agenda.
3. Camera. For AR the speed of the camera is more important than the resolution. Speed is the most important factor, slow camera produce blurred images which are extremely hard to process (extract features, edges etc)
Position of the camera. Most of the users are holding device horizontally while using AR. Specific of the mobile AR is that simultaneously user is getting input from the peripheral vision. To produce picture consistent with peripheral vision camera should be in the center of the device, not on the extreme edge like in N900.
Lack of skewing, off-center, radial and rolling shutter distortions of the camera is another factor. In this respect Nokia phone cameras are quite good for now, unlike iPhone.
4. Buttons. Touchscreen is not very helpful to AR, all screen real estate should be dedicated to the environment representation. While it’s quite possible to make completely gesture-driven AR interface buttons are still helpful. There should be at least one easily accessible button on the front panel. N95 with slider out to the right is the almost perfect setup – one big button on front panel and some on the slider on the opposite side. N900 with buttons only on the slider, slider sliding only down and no buttons on the front panel is the example of unhelpful buttons placement.

II. Software side
1. Fragmentation.
Platform fragmentation is the bane of mobile developers. Especially if several new models launched every quarter. One of the reasons of the phenomenal success of iPhone application platform is that there is no fragmentation whatsoever. Whit the huge zoo of models it practically impossible support all that are in the suitable hardware range. That is especially difficult with AR apps, which are closely coupled with camera technical specification, display size and ratio etc. If manufacturers want to make it easy for devs they should concentrate on one AR-friendly line of devices, with binary, or at least source code compatibility between models.
2. Easy access to DSP in API. It would effectively give developer a second CPU.
3. Access to raw data from camera. Why row data from camera are not accessible from ordinary API and only available to selected elite developer houses is a mistery to me. Right now, for example for Symbain OS camera viewfinder convert data to YUV422, from YUV422 to BMP and ordinary viewfinder API have access to BMP only. Quite overhead.
4. API to access internal camera parameters – focus distance etc. Otherwise every device have to be calibrated by developer.

10, November, 2009 Posted by mirror2image | Augmented Reality, Mobile | , , , , , , | 9 Comments

Symbian Signed again.

A discussion is going on at the symbian.org. It looks like a new symbain signed rules are in the work (my guess they will be implemented no earlier than symbian^4). Symbian signed may become cheaper and a new class of publisher ID may become available for anyone with a credit card.

29, October, 2009 Posted by mirror2image | Mobile | , , , | No Comments Yet

Openness – Maemo vs Android

A great post from coll900 about comparative openness Maemo and Android for developers and users. Maemo designated as a clear win. The one point missing in the original post is a platform fragmentation. Android try to get around fragmentation using Java virtual machine (albeit with non-standard bytecodes). However native code will not be binary transferable between devices. That is especially relevant for augmented reality and other cpu-heavy apps. Here is a question – will Maemo be any better? For some mysterious reasons Nokia afflicted by irresistible drive to fragment it’s own software platform as much as possible. If Nokia manage to gather enough strength of will to keep Maemo on a single but mass-produced device line, like Apple with iPhone, Maemo could become developers dream and a serious competitor to iPhone. However if Nokia keeps its bad habit of producing zoo of semi-decent not-quite-compatible devices, with introduction of a new just-little-different device every quarter, just to break whatever compatibility still remaining, Maemo, with all its openess will not have practical advantage over Android.

PS. It looks like there will not be any Maemo fragmentation. Source at Nokia told Reuters that there will be one Maemo device, at least for next year. That a good news actually.

28, October, 2009 Posted by mirror2image | Mobile | , , , , | 1 Comment

Fast Fourier Transform for P2P networking

Very unusual application of FFT in this arxiv paper. Butterfly diagrams for radix-n FFT allow building P2P network with maximum diversity, reliability and flexibility and minimum complexity.

19, October, 2009 Posted by mirror2image | Uncategorized | , | No Comments Yet

Solution – free gauge

Looks like the problem was not the large Gauss-Newton residue. The problem was gauge fixing.
Most of bundle adjustment algorithms are not gauge invariant inherently (for details check Triggs “Bundle adjustment – a modern synthesis”, chapter 9 “Gauge Freedom”). Practically that means that method have one or more free parameters which could be chosen arbitrary (for example scale), but which influence solution in non-invariant way (or don’t influence solution if algorithm is gauge invariant). Gauge fixing is the choice of the values for that free parameters. There exist at least one gauge invariant bundle adjustment method (generalization of Levenberg-Marquardt with complete matrix correction instead of diagonal only correction) , but it is order of magnitude more computational expensive.
I’ve used fixing coordinate of one of the 3d points for gauge fixing. Because method is not gauge invariant solution depend on the choice of that fixed point. The problem occurs when the chosen point is “bad” – error in feature point detector for this point is so big that it contradict to the rest of the picture. Mismatching in the point correspondence can cause the same problem.
In my case, fixing coordinate of chosen point caused “accumulation” of residual error in that point. This is easy to explain – other points can decrease reprojection error both by moving/rotating camera and by shifting their coordinates, but fixed point can do it only by moving/rotating camera. It looks like if the point was “bad” from the start it can become even worse next iteration as the error accumulate – positive feedback look causing method become unstable. That’s of cause only my observations, I didn’t do any formal analysis.
The obvious solution is to redistribute residual error among all the points – that mean drop gauge fixing and use free gauge. Free gauge is causing arbitrary scaling of the result, but the result can be rescaled later. However there is the cost. Free gauge means matrix is singular – not invertible and Gauss-Newton method can not work. So I have to switch to less efficient and more computationally expensive Levenberg-Marquardt. For now it seems working.
PS Free gauge matrix is not singular, just not well-defined and has degenerate minimum. So constrained optimization still may works.
PPS Gauge Invariance is also important concept in physics and geometry.
PPPS While messing with Quasi-Newton – it seems there is an error in chapter 10.2 of “Numerical Optimization” by Nocedal&Wright. In the secant equation instead of S_{k+1}(x_{k+1} - x_{k}) = J^{T}_{k+1}r_{k+1} - J^{T}_{k}r_{k} should be S_{k+1}(x_{k+1} - x_{k}) = J^{T}_{k+1}r_{k+1} - J^{T}_{k}r_{k+1}

11, October, 2009 Posted by mirror2image | Coding AR, computer vision | , , , , , | No Comments Yet

Problems

During the tests I’ve found out that bundle adjustment is failing on some “bad frames”. There two ways to deal with it – reject bad frames or try to understand what happen – who set up us a bomb? :-) .Any problem is also an opportunity to understand subject better. For now I suspect Gauss-Newton is failing due to too big residue. Just adding Hessian to J^{T}J does not help – I’m getting negative eigenvalue. So now I’m trying quasi-Newton from the excellent book by Nocedal&Wright. If it will not help I’ll try hybrid Fletcher method.

PS It looks like the problem was not the large residue

6, October, 2009 Posted by mirror2image | Coding AR, Uncategorized | , , , , , | No Comments Yet

Some phase correlation tricks

Then doing phase correlation on low-resolution, or extreme low-resolution (like below 32×32) images, the noise could become a serious problem, up to making result completely useless. Fortunately there are some tricks, which help in this situation. Some of them I stumbled upon myself, and some picked up in relevant papers.
First is obvious – pass image through the smoothing filter. Pretty simple window filter from integral image can help here.
Second – check consistency of result. Histogram of cross-power specter can help here. Here there is the wheel within the wheel, which I have found out the hard way – discard lower and right sectors of cross-power specter for histogram, they are produced from high-frequency parts of the specter and almost always are noise, even if cross-power specter itself quite sane.
Now more academic tricks:
You could extract sub-pixel information from cross-power specter. There are lot of ways to do it, just google/citeseer for it. Some are fast and unreliable, some slow and reliable.
Last one is really nice, I’ve picked it from Carneiro & Japson paper about phase-based features.
For cross power specter calculation instead of
\frac{F_{1}\cdot F_{2}^{*}} {\left| F_{1}\cdot F_{2} \right|}
use
\frac{F_{1}\cdot F_{2}^{*}} {a + \left| F_{1}\cdot F_{2} \right|}
where a is a small positive parameter
This way harmonics with small amplitude excluded from calculations. This is pretty logical – near zero harmonics have phase undefined, almost pure noise.

PS
Another problem with extra low-resolution phase correlation is that sometimes motion vector appear not as primary, but as secondary peak, due to ambiguity of the images relations. I have yet to find out what to do in this situation…

29, August, 2009 Posted by mirror2image | Coding AR | , , , | No Comments Yet

Importance of phase

Here are some nice pictures illustrating importance of Fourier phase

27, August, 2009 Posted by mirror2image | Coding AR | , , | No Comments Yet