Something I did for Samsung (kernel of tracker). Biggest improvement in SARI 1.5 is the sensors fusion, which allow for a lot more robust tracking.
Here is example of run-time localization and mapping with SARI 1.5:
This is the AR EdiBear game (free in Samsung apps store)
I once asked, what’s 3d registration/reconstruction/pose estimation is about – optimization or statistics? The more I think about it, the more I convinced it’s at least 80% statistics. Often specifically optimization tricks like Tikhonov regularization have statistical underpinning. Stability of optimization is robust statistics(Yes I know, I repeat it way too often). Cost function formulation is a formulation for error distribution and define convergence speed.
Now unrelated(almost) AR stuff:
I already mentioned on Twitter that version of markerless tracker for which I did a lot of work is part of Samsung AR SDK (SARI) for Android and Bada. It was was shown at AP2011(Presentaion and also include nice Bada code). AR SDK presentation is here.
Some videos form presentation – Edi Bear game demo with non-reference tracking at the end of the video and less trivial elements of SLAM tracking. Other application of SARI SDK – PBI (This one seems use earlier version).
Smartphone with stereocamera is not exactly a new concept
But 3d registration, rangefinding, augmented reality would be a lot more robust and efficient with stereocamera.
Of cause it should be implemented properly, distance between lenses should be as big as possible. Preferably with lenses near opposite ends of the phone, to increase baseline, which would increase 3d precision.
Special geek model could have second camera on the retractable extender for even more precision.
Stereocamera would make AR markerless tracking trivial. 3d structure of the scene could be triangulated in one step form the single stereoframe.
Here is some more narrow baseline local bundle adjustment, from only two camera frames.
Outlier is drawn in red. Some points are not detected as outliers, but still are not localized properly.
Multiscale FAST used for detection. No descriptors were used for point correspondence, instead incremental tracking with search in sliding window by average gradient responses was used(there are three tracked frames between those two). I think those bad points could possibly be isolated with some geometric consistency rules, presuming landscape is smooth.
Here I want to talk about matching in image registration. We are doing registration in 3D or 2D, and using feature points for that. Next stage after extraction of feature points from the image is finding corresponding points in two(or more) images. Usually it’s done with descriptors, like SIFT, SURF, DAISY etc. Sometimes randomized trees are used for it. Whatever methods is used it usually has around .5% of false positives. False positives create outliers in registration algorithm. That is not a big problem in planar trackers or model/marker trackers. It could be a problem for Structure From Motion though. If CPU power is not limited the problem is not very serious. Heavy-duty algorithms like full-sequence bundle adjustment and RANSAC cope with outliers pretty well. However even for high-end mobile phones such algorithms are problematic. Some tricks can help – Georg Klein put full-sequence bundle adjustment into separate thread on PTAM tracker to run asynchronously, but I’m trying to do local, 2-4 frames bundle adjustment here. The problem of false positives is especially difficult for images of patterned environment, where some image parts are similar or repeated.
Here mismatched correspondence marked with blue line (points 15-28).
As you can see it’s not easy for any descriptor to tell the difference between points 13(correct) and 15(wrong) on the left image – their neighborhood is practically the same:
Such situations could easily happen not only indoor, but also in cityscape, industrial, and others regular environments.
One solution for such cases is to increase descriptor radius, to process a bigger patch around the point, but that would create problems of its own, for example too much false negatives.
Other approach is to use geometric consistency of the image points positions.
There are at least two ways to do it.
One is to consider displacements of corresponding points between frames. Here is example from paper by Kanazawa et al “Robast Image Matching Preserving Global Geometric Consistency”
This method first gathering local displacement statistic around each points, filter out outliers and and apply smoothing filter. Here are original matches, matches after applying consistency check and matches after applying smoothing filter.
However this method works best for dense, regular sets of feature points. For small, sparse set of points it does not improving situation much.
Here is a second approach. Build graph out of feature points for each frame.
Local topological structure of the two graphs is different because of false positives. It’s easy to find graph vertices/edges which cause inconsistency – edges marked blue.They can be found for example by signs of crossproducts between edges. After offending vertices found they are removed:
There are different ways to build graph out of feature points. Simplest is nearest neighbors, but may be Delaney triangulation or DSP can do better.
Though Levenberg-Marquardt works I’m still trying to save Gauss-Newton, especially as I’ve read paper saying that Gauss-Newton with dogleg trust-region works well for bundle adjustment. I’ll probably try direct substitution with Cholesky rank-1 update and constrained optimization.
Looks like the problem was not the large Gauss-Newton residue. The problem was gauge fixing.
Most of bundle adjustment algorithms are not gauge invariant inherently (for details check Triggs “Bundle adjustment – a modern synthesis”, chapter 9 “Gauge Freedom”). Practically that means that method have one or more free parameters which could be chosen arbitrary (for example scale), but which influence solution in non-invariant way (or don’t influence solution if algorithm is gauge invariant). Gauge fixing is the choice of the values for that free parameters. There exist at least one gauge invariant bundle adjustment method (generalization of Levenberg-Marquardt with complete matrix correction instead of diagonal only correction) , but it is order of magnitude more computational expensive.
I’ve used fixing coordinate of one of the 3d points for gauge fixing. Because method is not gauge invariant solution depend on the choice of that fixed point. The problem occurs when the chosen point is “bad” – error in feature point detector for this point is so big that it contradict to the rest of the picture. Mismatching in the point correspondence can cause the same problem.
In my case, fixing coordinate of chosen point caused “accumulation” of residual error in that point. This is easy to explain – other points can decrease reprojection error both by moving/rotating camera and by shifting their coordinates, but fixed point can do it only by moving/rotating camera. It looks like if the point was “bad” from the start it can become even worse next iteration as the error accumulate – positive feedback look causing method become unstable. That’s of cause only my observations, I didn’t do any formal analysis.
The obvious solution is to redistribute residual error among all the points – that mean drop gauge fixing and use free gauge. Free gauge is causing arbitrary scaling of the result, but the result can be rescaled later. However there is the cost. Free gauge means matrix is singular – not invertible and Gauss-Newton method can not work. So I have to switch to less efficient and more computationally expensive Levenberg-Marquardt. For now it seems working.
PS Free gauge matrix is not singular, just not well-defined and has degenerate minimum. So constrained optimization still may works.
PPS Gauge Invariance is also important concept in physics and geometry.
PPPS While messing with Quasi-Newton – it seems there is an error in chapter 10.2 of “Numerical Optimization” by Nocedal&Wright. In the secant equation instead of should be
During the tests I’ve found out that bundle adjustment is failing on some “bad frames”. There two ways to deal with it – reject bad frames or try to understand what happen – who set up us a bomb? :-).Any problem is also an opportunity to understand subject better. For now I suspect Gauss-Newton is failing due to too big residue. Just adding Hessian to does not help – I’m getting negative eigenvalue. So now I’m trying quasi-Newton from the excellent book by Nocedal&Wright. If it will not help I’ll try hybrid Fletcher method.
Code of markerless tracker is finished for emulator. It’s in in minimal configuration, without some optimizations, bell and whistles like combined points-edge pose estimation for now. Now it’s bugs squashing and testing with different video feeds for some times. Modified bundle adjustment is the nicest part, seems pretty stable and robust.
I had discussion with Lester Madden at linkedin MAR group. The thing we discussed was the concept of the locality in the AR. That is, each AR object should be attached to specific location and accessible only from that location.
I’ll try explain it more in depth here.
Augmented graffiti, augmented reality mail/drop boxes and billboards, user-built reality overlays – all of those should be attached to specific location. This locality could be enforced – only local data would be available (filtered into) in the specific location. This locality of data prevent user from sinking in the augmented noise, generated all other the world, and reduce possibility of spam.
For example you can have neighborhood billboard, leave note for the friends in the park and so on. All those AR objects data could be accessed only locally for both read and write – to read billboard and to post a message on it you would have to go to it.
The user should get the data/content only if he is physically present at the specific location. The same way poster/producer of the data or AR object should physically visit each location where it placed.
If locality is enforced, to place note for your friend in the park you have to visit park, and there is no way around it.
Locality could be enforced with location-based encryption. I think this encryption could be made with use of geometric hashing. User scan environment and make 3d registration with his mobile or wearable device. Encryption key is generated by mobile device from the scanned 3d model of the environment.
If user want to get data attached to the location, he access the server, retrieve local data and decrypt them with that key.
In the opposite direction, if user want to attach some object or data to location, mobile device encrypt data with part of the hash key and send other part of the key to server. Before storing data the server do uniqueness check. Nearby data already stored on the server are checked, and the new data allowed in only if there is some distance from new key to keys of all the other stored data. After that new data encrypted with the second part of the key by server and stored.
Each object encrypted by two keys, one of which is server side. Server have no access to content of the data, but have access to the part of the location hash key. That way no two objects or data attached to exactly the same location. Clattering of AR objects could be reduced. More importantly if poster have to physically visit location where he want to place AR object, he should have at least some relation to that location, and he is not some spammer from the other end of the world.
If spammer forge location key without actually visiting the place, that will most probably be non-existing location, and no one will be hit by his data.
That all is of cause is a rough outline of how could enforced locality works. Building robust algorithm for extracting geometric hash could be non-trivial.