Mirror Image

Mostly AR and Stuff

Why 3d markerless tracking is difficult for mobile augmented reality

I often hear sentiments from users that they don’t like markers, and they are wondering, why there are so relatively few markerless AR around. First I want to say that there is no excuse for using markers in the static scene with immobile camera, or if desktop computer is used. Brute force methods for tracking like bundle adjustment and fundamental matrix are well developed and used for years and years in the computer vision and photogrammetry. However those methods in their original form could hardly produce acceptable frame rate on the mobile devices. From the other hand marker trackers on mobile devices could be made fast, stable and robust.
So why markers are easy and markerless are not ?
The problem is the structure , or “shape” of the points cloud generated by feature detector of the markerless tracker. The problem with structure is that depth coordinate of the points is not easily calculated. That is even more difficult because camera frame taken from mobile device have narrow baseline – frames taken form position close one to another, so “stereo” depth perception is quite rough. It is called structure from motion problem.
In the case of the marker tracker all feature points of the markers are on the same plane, and that allow to calculate position of the camera (up to constant scale factor) from the single frame. Essentially, if all the points produced by detector are on the same plane, like for example from the pictures lying on the table, the problem of structure from motion goes away. Planar cloud of point is essentially the same as the set of markers – for example any four points could be considered as marker and the same algorithm could apply. Structure from motion problem is why there is no easy step from “planar only” tracker to real 3d markerless tracker.
However not everything is so bad for mobile markerless tracker. If tracking environment is indoor, or cityscape there is a lot of rectangles, parallel lines and other planar structures around. Those could be used as initial approximation for one the of structure from motion algorithm, or/and as substitutes for markers.
Another approach of cause is to find some variation of structure from motion method which is fast and works for mobile. Some variation of bundle adjustment algorithm looks most promising to me.
PS PTAM tracker, which is ported to iPhone, use yet another approach – instead of using bundle adjustment for each frame, bundle adjustment is running in the separate thread asynchronously, and more simple method used for frame to frame tracking.
PPS And the last thing, from 2011:

30, March, 2009 - Posted by | Coding AR | , , , , , , , ,


  1. I am releasing a series of blog posts about AR that you might be interested in – http://bit.ly/B0VOc – would love your feedback

    Comment by freedimensional | 2, May, 2009

  2. Interesting blog, thank you.

    Comment by Petrov Alexander | 10, June, 2009

  3. […] Also, a techy read:  Why 3d markerless tracking is difficult for mobile augmented reality […]

    Pingback by Week 2 summary – video analyzing on iPhone at augmenting | 21, September, 2009

  4. Interesting post!

    I think though that the question marker vs. non-marker be seen differently:

    Due to the way markers are encoded, a marker can actually provide information. The is in fact the original reason for using barcodes on products – they tell the computer which product it is. A good (non-AR) use case can be seen in Japan with the omni-presence of QR-codes all around.

    Non-marker needs knowledge about the scene either beforehand (model-based tracking) or gathering on-the-fly (SLAM).

    This makes a major difference between marker vs. non-marker. Everything else is just a technical detail: Of course it is more difficult to do non-marker, because markers are designed to make it easy! But the difficulty in non-marker tracking will go away as research progresses.

    There is a small inaccuracy in the original post from Sergey: Of course one can calculate the pose from a single image even if the object is non-planar. The planarity makes a few things simpler, because one can use a homography to describe the object, but that is again just a minor detail. Model-based tracking from non-planar objects has been done for a long time now. We’ve recently shown a very fast 3D object tracker running on mobile phones:

    Comment by Daniel | 25, September, 2009

Sorry, the comment form is closed at this time.

%d bloggers like this: