I often hear sentiments from users that they don’t like markers, and they are wondering, why there are so relatively few markerless AR around. First I want to say that there is no excuse for using markers in the static scene with immobile camera, or if desktop computer is used. Brute force methods for tracking like bundle adjustment and fundamental matrix are well developed and used for years and years in the computer vision and photogrammetry. However those methods in their original form could hardly produce acceptable frame rate on the mobile devices. From the other hand marker trackers on mobile devices could be made fast, stable and robust.
So why markers are easy and markerless are not ?
The problem is the structure , or “shape” of the points cloud generated by feature detector of the markerless tracker. The problem with structure is that depth coordinate of the points is not easily calculated. That is even more difficult because camera frame taken from mobile device have narrow baseline – frames taken form position close one to another, so “stereo” depth perception is quite rough. It is called structure from motion problem.
In the case of the marker tracker all feature points of the markers are on the same plane, and that allow to calculate position of the camera (up to constant scale factor) from the single frame. Essentially, if all the points produced by detector are on the same plane, like for example from the pictures lying on the table, the problem of structure from motion goes away. Planar cloud of point is essentially the same as the set of markers – for example any four points could be considered as marker and the same algorithm could apply. Structure from motion problem is why there is no easy step from “planar only” tracker to real 3d markerless tracker.
However not everything is so bad for mobile markerless tracker. If tracking environment is indoor, or cityscape there is a lot of rectangles, parallel lines and other planar structures around. Those could be used as initial approximation for one the of structure from motion algorithm, or/and as substitutes for markers.
Another approach of cause is to find some variation of structure from motion method which is fast and works for mobile. Some variation of bundle adjustment algorithm looks most promising to me.
PS PTAM tracker, which is ported to iPhone, use yet another approach – instead of using bundle adjustment for each frame, bundle adjustment is running in the separate thread asynchronously, and more simple method used for frame to frame tracking.
PPS And the last thing, from 2011:
Something I’ve picked up at The n-Category Café. Algebra and geometry are analogous to syntax and semantic with syntax corresponding to algebra and semantics to geometry. This broad statement have precise meaning, which could be expressed as duality between Boolean algebras and specific topological spaces, which used in the study of formal semantics of computer languages.
Which platform suit better for mobile AR ? Each has it pluses and minuses. I’m trying to make overall estimation, not only form prototype development pov.
+ beautiful phone
+! no platform fragmentation
+ application store
+ growing market share
+ 3d accelerator, GPS, accelerometer
+ active developer community
-!! No official camera API for now, direct access to camera require undocumented API
– slow camera on the existing model (better in the next model ?)
– CPU underclocked to 412Mhz on existing model (better in the next model ?)
+ Open sourced
+ good CPU for existing model (528Mhz for G1)
+ 3d accelerator, GPS, accelerometer for existing model
+ active developer community
+ application store
+ completely open model for developers available.
-! officially java only (10-100 more slow than native code for numerical tasks), installation of native code app require hack on the consumer model.
– low market penetration for now(will be better?)
+! Big market share
+ some models have good CPU (up to 600Mhz)
+ some models have fast camera
+ some models have 3d accelerator, GPS, accelerometer and even electronic compass
+ application store coming soon for Nokia models
+ will be open source soon
+ situation with Symbian Signed may improve in the future.
-! platform fragmentation, different OS versions are only partially compatible.
– Symbian signed prevent access to GPS/accelerometer for early versions(S60 FP3) self-signed application
-! For signed app – each binary version should be paid and signed separately, require expensive Publisher ID
– No self-signed application allowed to app store.
– high learning curve
– Market share is shrinking now, eaten by iPhone
Not many specific pluses or minuses.
– Small market share
5. Other flavors of Linux – situation is not clear yet.
Also Symbian OS. 600Mhz CPU, 3d accelerator, accelerometer, proximity sensor, GPS, touchscreen. Some kind of hardware image processor seems too, but with closed API, so that is not really useful.
Here are full spec.
PS. In relation to this article, if smartphone need better display for AR – what mobile AR device need first
In relation to tracking cityscape I did some planar segmentation test. Segmented FAST generated corners with simple 5-points projective invariant.
In some cases 5-point give some rough approximation:
In some cases outliers are quite bad – some point have very close projective invariant but still are in diffferent planes.
So simple method not quite work…
David Wood from Symbian Foundation answered a question about Symbian Signed (mandatory digital signature for all Symbian OS app with advanced capabilities) in comments to his blog post about Symbian Release Plan :
“>What about Symbian Signed? Are there any plans to drop it, or at least relax it…?
A number of options for improving the operation of Symbian Signed are under active consideration.”
So it seems Symbian foundation is hearing to developers and end users lamentation and things could be better soon.
I have tested oriented descriptors SURF descriptors vs upright descriptors for approximately horizontally oriented camera images and got feature density less than oriented then for upright. Repeatability of oriented was worse too…
One of the big problem in image registration/structure from motion/3d tracking is using global information of the image. Feature/blob extraction, like SIFT, SURF or FAST etc using only local information around the point. Region detector like MSER using area information, but MSER is not good at tracking textures, and not quite stable at complex scenes. Edge detection provide some non-local information, but require processing edges. That could be computationally heavy, but looks promising anyway. There are a lot of methods which use global information – all kind of texture segmentation, epitome, snakes/appearance models, but those are computationally heavy and not suitable for mobiles. The question is how to incorporate global information from the image into tracker, and make it with minimal amount of operations. One way is to optimise tracker for specific environment – for example use the property of cityscape, a lot of planar structures and straight lines. Such multiplanar tracker wouldn’t work in the forest or park, but could be a working compromise.