Bundle Adjustemnt on the Mars with Rover
Just found out – Mars Rovers used bundle adjustment for its localization and rocks modeling:
“Purpose of algorithm:
To perform autonomous long-range rover localization based on bundle adjustment (BA) technology.
Processing steps of the algorithm include interest point extraction and matching, intra- and inter- stereo tie point selection, automatic cross-site tie point selection by rock extraction, modeling and matching, and bundle adjustment”
Video Surveillance is Useless
Found this interesting slide presentation form Peter Kovesi, inventor of phase congruency edge detector. It basically saying, that on current tech level video surveillance is useless for face identification. What follow is that it’s actually harmful, due to wrong impression of it’s reliability.
Also on his page – some fun animation or How to Animate Impossible Objects

PS Fourier phase approach to feature detection looks really promising, especially if to find some low computation cost modification.
Why 3d markerless tracking is difficult for mobile augmented reality
I often hear sentiments from users that they don’t like markers, and they are wondering, why there are so relatively few markerless AR around. First I want to say that there is no excuse for using markers in the static scene with immobile camera, or if desktop computer is used. Brute force methods for tracking like bundle adjustment and fundamental matrix are well developed and used for years and years in the computer vision and photogrammetry. However those methods in their original form could hardly produce acceptable frame rate on the mobile devices. From the other hand marker trackers on mobile devices could be made fast, stable and robust.
So why markers are easy and markerless are not ?
The problem is the structure , or “shape” of the points cloud generated by feature detector of the markerless tracker. The problem with structure is that depth coordinate of the points is not easily calculated. That is even more difficult because camera frame taken from mobile device have narrow baseline – frames taken form position close one to another, so “stereo” depth perception is quite rough. It is called structure from motion problem.
In the case of the marker tracker all feature points of the markers are on the same plane, and that allow to calculate position of the camera (up to constant scale factor) from the single frame. Essentially, if all the points produced by detector are on the same plane, like for example from the pictures lying on the table, the problem of structure from motion goes away. Planar cloud of point is essentially the same as the set of markers – for example any four points could be considered as marker and the same algorithm could apply. Structure from motion problem is why there is no easy step from “planar only” tracker to real 3d markerless tracker.
However not everything is so bad for mobile markerless tracker. If tracking environment is indoor, or cityscape there is a lot of rectangles, parallel lines and other planar structures around. Those could be used as initial approximation for one the of structure from motion algorithm, or/and as substitutes for markers.
Another approach of cause is to find some variation of structure from motion method which is fast and works for mobile. Some variation of bundle adjustment algorithm looks most promising to me.
PS PTAM tracker, which is ported to iPhone, use yet another approach – instead of using bundle adjustment for each frame, bundle adjustment is running in the separate thread asynchronously, and more simple method used for frame to frame tracking.
Oriented descriptors vs upright
I have tested oriented descriptors SURF descriptors vs upright descriptors for approximately horizontally oriented camera images and got feature density less than oriented then for upright. Repeatability of oriented was worse too…
Tracking cityscape
One of the big problem in image registration/structure from motion/3d tracking is using global information of the image. Feature/blob extraction, like SIFT, SURF or FAST etc using only local information around the point. Region detector like MSER using area information, but MSER is not good at tracking textures, and not quite stable at complex scenes. Edge detection provide some non-local information, but require processing edges. That could be computationally heavy, but looks promising anyway. There are a lot of methods which use global information – all kind of texture segmentation, epitome, snakes/appearance models, but those are computationally heavy and not suitable for mobiles. The question is how to incorporate global information from the image into tracker, and make it with minimal amount of operations. One way is to optimise tracker for specific environment – for example use the property of cityscape, a lot of planar structures and straight lines. Such multiplanar tracker wouldn’t work in the forest or park, but could be a working compromise.
From financial crisis to image processing: Ignore Topology At Your Own Risk.
Very interesting article in Wired Recipe for Disaster: The Formula That Killed Wall Street . I’m not a statistician, but I’ll try to explain it. The gist of the article is that in the heart of the current financial crisis is the David X. Li formula, which use “Gaussian copula function” for risk estimation. The idea of formula is that if we have to estimate joint probability of two random events, it could be done with simple formula, which use only probability distributions of each event as if they were independent and a single parameter – statistical correlation. So what bankers did – instead of looking into relationships and connections between events they just did calculate one single statistical parameter and used it for risk estimation. Even more – they applied the same formula to the results of those relatively simple calculations and build pyramids of estimations, each next step applying the same simple formula to results of the previous step. As a result, an extremely complex behavior was reduced to the simple linear model, which had little in common with reality.
And now – the illustration from wiki, what exactly this single parameter – correlation is:
![]()
Here are several two-variable distributions and their correlation coefficients. It could be seen that for linear relationships correlation capture dependence of variables perfectly (middle raw). For upper row – normal distributions – it capture the essence of dependency. We can say something about other variable if we know one variable and correlation in that case. For complex shapes – lower row – correlation is zero for each. Each of the lower shapes will be represented as the upper central shape (fuzzy ball) with correlation. Correlation capture nil information about how one variable depend on another for the lower shapes. Correlation allow representation of any shape only as fuzzy ellipse. Li’s formula reduce dimensionality. The thing is, dimensionality – topological property, and you don’t mess with topological properties easily. Imagine bankers using fuzzy ball instead of ring for risk estimation…
Now to the image processing. Most of feature detection in image processing is done for grayscale image. Original image is usually RGB, but before features extraction it converted to grayscale.
However the original image is colored, why not to use colors for feature detection ? For example detect features in each color channel separately?
The thing is, the pictures in each color channel are very similar.
![]()
The extraction of blobs in each channel in most cases will triple the job without gaining of significant new information – all the channels will give about the same blobs.
Nevertheless it’s obvious, there is some nontrivial information about the image, encoded in colors.
Why blob detection for each color don’t give access to it ?
The reason is the same as for current financial crisis – dimensionality. Treating each color channel separately we replace five-dimensional RGB+coordinates space with three three-dimensional color+coordinates spaces. Relationships between color channels are lost. Topology of color structure is lost.
To actually use color information, statistical relationships between colors of the image should be explored – something like three dimensional color bins histogram, essentially converting image from RGB to indexed color.
Markerless tracking with FAST
Testing outdoor markerless tracking with FAST/SURF feature detector.
The plane of the camera is not parallel to the earth, that make difficult for eye to estimate precision.

FAST with SURF descriptor
Feature detected with multistage FAST and fitted with SURF descriptors

Less strict threshold give a lot more correspondences, but also some false positives

Multiscale FAST detector
Experimenting with multiscale FAST detector with images from cell phone camera.

so far so good…
Testin FAST feature detector
Testing FAST feature detector on the Mikolajczyk ’s dataset. Here scale space seems actually useful. With “brick wall” dataset repeatability goes form .3 to .7 with scale from 0 t 2^^3, and threshold/barrier lowering from 40 to 20.