I once asked, what’s 3d registration/reconstruction/pose estimation is about – optimization or statistics? The more I think about it, the more I convinced it’s at least 80% statistics. Often specifically optimization tricks like Tikhonov regularization have statistical underpinning. Stability of optimization is robust statistics(Yes I know, I repeat it way too often). Cost function formulation is a formulation for error distribution and define convergence speed.
Now unrelated(almost) AR stuff:
I already mentioned on Twitter that version of markerless tracker for which I did a lot of work is part of Samsung AR SDK (SARI) for Android and Bada. It was was shown at AP2011(Presentaion and also include nice Bada code). AR SDK presentation is here.
Some videos form presentation – Edi Bear game demo with non-reference tracking at the end of the video and less trivial elements of SLAM tracking. Other application of SARI SDK – PBI (This one seems use earlier version).
Genetic algorithms and especially their subset, Genetic programming were always fascinating me. My interest was fueled by on and off work with Global optimization, and because GA just plain cool. One of the most interesting thing about GA is that they work quite good on some “practical” problems, while there is no comprehensive theoretical explanation why they should work so well (Of cause they are not always so useful. There was a work on generating feature descriptors with GA, and results were less then impressive).
Historically, first and most well known explanation for GA efficiency was the building block hypothesis. Building block hypothesis is very intuitive. It say that there are exist “building blocks” – small parts of genome with high fitness. GA work is randomly searching for those building blocks and combining them afterward, until global optimum is found. Searching is mostly done with mutation, and combining found building block with crossover (analog of exchange of genetic material in real biological reproduction).
However building blocks have a big problem, and that problem is crossover operator. If building block hypothesis is true, GA work better if integrity of building blocks preserved as much as possible. That is there should be only few “cut and splice” points in the sequence. But practically GA with “uniform” crossover – massive uniform mixing of two genomes, work better then GA with few crossover points.
Recently a new theory of GA efficiency appears, that try to deal with uniform crossover problem – Generative fixation” hypothesis. The idea behind “generative fixation” is that GA works in continuous manner, fixing stable groups of genes with high fitness and continuing search on the rest of genome, reducing search space step by step. From optimization point of view GA in that case works in manner similar to Conjugate gradient method, reducing (or trying) dimensionality of search space in each step. Now about “uniform crossover” – why it works better: subspace, to which search space reduced, should be stable (in stability theory sense). Small permutations wouldn’t case solution to diverge. With uniform crossover of two close solutions resulting solution still will be nearby attractive subspace. The positive effect of uniform crossover is that it randomize solution, but without exiting already found subspace. That randomization clearing out useless “stuck” genes (also called “hitchhikers”), and help to escape local minima.
Interesting question is, what if subspace is not “fixed bits” and even not linear – that is if it’s a manifold. In that case (if hypothesis true) found genes will not be “fixed”, but will “drift” in systematic manners, according to projection of manifold on the semi-fixed bits.
Now to efficiency GA for “practical” task. If the “generative fixation” theory is correct, “practical” task, for which GA work well could be the problems for which dimensionality reduction is natural, for example if solution belong to low-dimensional attractive manifold. (addenum 7/11)That mean GA shouldn’t work well for problem which allow only combinatorial search. Form this follow that if GA work for compressed sensing problem it should comply with Donoho-Tanner Phase Transition diagram.
Overall I like this new hypothesis, because it bring GA back to family of mathematically natural optimization algorithms. That doesn’t mean the hypothesis is true of cause. Hope there will be some interest, more work, testing and analysis. What is clear that is current building block hypothesis is not unquestionable.
Simple googling produced paper by Beyer An Alternative Explanation for the Manner in which Genetic Algorithms Operate with quite similar explanation how uniform crossover works.
Via Marketwire.Here it is, Wrap 920AR:
* 1/3-inch wide VGA Digital Image Sensor
* Resolution: 752H x 480W per lens
* Frame rate: 60 fps
* High-speed USB 2.0
* some kind of 6DoF tracker (probably 3-axis accelerometer and/or e-compass, I don’t have hopes for gyroscope)
* Supported by Vuzix Software Developer Program
$799.99, expected availability is 2nd quarter of 2010.
The Wrap 920AR’s stereo camera assembly and 6-DoF Tracker will also be available separately for upgrading existing Wrap video eyewear. Here is Wrap 920AR at vizux homepage
“A technology that is ’20 years away’ will be 20 years away indefinitely.”
Thanks to Igor Carron for pointing out this video lecture
Compressive Sensing for Computer Vision: Hype vs Hope
It start with comprehensible explanation of what compressive sensing is about (BTW wiki article on compressive sensing is wholly inadequate).
Basically it’s about imagining the lower-dimensional signal(image) as projection by rectangular matrix from mostly zero high-dimensional vector. It happens that this sparse high-dimensional vector can be restored if the matrix is almoste orthonormal (Restricted Isometry Property). Discrete Fourier Transform and random matrices have that property.
This sparse vector could be considered as classification space for original signal. So application of Compressive Sensing to Computer Vision is mostly about classification or recognition. As methods used by CS are convex and linear programming those are not run-time methods, and would not help much in real-time tracking. There is CS-inspired advise at the end of the lecture, about trying to replace norm optimization with norm. That could be actually helpful in some cases. If approximated as iteratively reweighted it’s essentially the same as robustification of least square method.
Trying a new descriptor, inspired by SURF and SIFT. Want to use gradient instead of Haar transforms of intensity, but with less dimensionality than SURF. Also don’t need rotation/scale invariance, because using incremental tracking.
I have been struck off the list of the Nokia Augmented Reality co-creation session, so here is a gist of what I was intending to say about AR-friendly mobile devices.
I will not repeat obvious here (requirements for CPU, FPU, RAM etc.) but concentrate on things which are often missed.
I. Hardware side
1. Battery life is the most important thing here. AR applications are eating battery extremely fast – full CPU load, memory access, working camera and on top of it wireless data access, GPS and e-compass.
It’s not realistic to expect dramatic improvement in the battery life in near future, though fuel cells and air-fueled batteries give some hope. If one think short term the dual battery is the most realistic solution. AR-capable devices tend to be quite heavy and not quite slim anyway, so second battery will not make dramatic difference (iPhone could be exception here).
Now how to make maximum out of it? Make batteries hot-swappable with separate slots and provide separate battery charger. If user indoor he/she can remove empty battery and put it on charge while device is running on the second.
2. Heating. Up until now no one was paying attention to the heating of mobile devices, mostly because CPU-heavy apps are very few now (may be only 3d games). AR application produce even more heat than 3d game and device could become quite hot. So heatsinks and heatpumps are on the agenda.
3. Camera. For AR the speed of the camera is more important than the resolution. Speed is the most important factor, slow camera produce blurred images which are extremely hard to process (extract features, edges etc)
Position of the camera. Most of the users are holding device horizontally while using AR. Specific of the mobile AR is that simultaneously user is getting input from the peripheral vision. To produce picture consistent with peripheral vision camera should be in the center of the device, not on the extreme edge like in N900.
Lack of skewing, off-center, radial and rolling shutter distortions of the camera is another factor. In this respect Nokia phone cameras are quite good for now, unlike iPhone.
4. Buttons. Touchscreen is not very helpful to AR, all screen real estate should be dedicated to the environment representation. While it’s quite possible to make completely gesture-driven AR interface buttons are still helpful. There should be at least one easily accessible button on the front panel. N95 with slider out to the right is the almost perfect setup – one big button on front panel and some on the slider on the opposite side. N900 with buttons only on the slider, slider sliding only down and no buttons on the front panel is the example of unhelpful buttons placement.
II. Software side
Platform fragmentation is the bane of mobile developers. Especially if several new models launched every quarter. One of the reasons of the phenomenal success of iPhone application platform is that there is no fragmentation whatsoever. Whit the huge zoo of models it practically impossible support all that are in the suitable hardware range. That is especially difficult with AR apps, which are closely coupled with camera technical specification, display size and ratio etc. If manufacturers want to make it easy for devs they should concentrate on one AR-friendly line of devices, with binary, or at least source code compatibility between models.
2. Easy access to DSP in API. It would effectively give developer a second CPU.
3. Access to raw data from camera. Why row data from camera are not accessible from ordinary API and only available to selected elite developer houses is a mistery to me. Right now, for example for Symbain OS camera viewfinder convert data to YUV422, from YUV422 to BMP and ordinary viewfinder API have access to BMP only. Quite overhead.
4. API to access internal camera parameters – focus distance etc. Otherwise every device have to be calibrated by developer.
A discussion is going on at the symbian.org. It looks like a new symbain signed rules are in the work (my guess they will be implemented no earlier than symbian^4). Symbian signed may become cheaper and a new class of publisher ID may become available for anyone with a credit card.
A great post from coll900 about comparative openness Maemo and Android for developers and users. Maemo designated as a clear win. The one point missing in the original post is a platform fragmentation. Android try to get around fragmentation using Java virtual machine (albeit with non-standard bytecodes). However native code will not be binary transferable between devices. That is especially relevant for augmented reality and other cpu-heavy apps. Here is a question – will Maemo be any better? For some mysterious reasons Nokia afflicted by irresistible drive to fragment it’s own software platform as much as possible. If Nokia manage to gather enough strength of will to keep Maemo on a single but mass-produced device line, like Apple with iPhone, Maemo could become developers dream and a serious competitor to iPhone. However if Nokia keeps its bad habit of producing zoo of semi-decent not-quite-compatible devices, with introduction of a new just-little-different device every quarter, just to break whatever compatibility still remaining, Maemo, with all its openess will not have practical advantage over Android.
PS. It looks like there will not be any Maemo fragmentation. Source at Nokia told Reuters that there will be one Maemo device, at least for next year. That a good news actually.