What I would say to Nokia about mobile AR (if it would listen)
#augmentedreality
I have been struck off the list of the Nokia Augmented Reality co-creation session, so here is a gist of what I was intending to say about AR-friendly mobile devices.
I will not repeat obvious here (requirements for CPU, FPU, RAM etc.) but concentrate on things which are often missed.
I. Hardware side
1. Battery life is the most important thing here. AR applications are eating battery extremely fast – full CPU load, memory access, working camera and on top of it wireless data access, GPS and e-compass.
It’s not realistic to expect dramatic improvement in the battery life in near future, though fuel cells and air-fueled batteries give some hope. If one think short term the dual battery is the most realistic solution. AR-capable devices tend to be quite heavy and not quite slim anyway, so second battery will not make dramatic difference (iPhone could be exception here).
Now how to make maximum out of it? Make batteries hot-swappable with separate slots and provide separate battery charger. If user indoor he/she can remove empty battery and put it on charge while device is running on the second.
2. Heating. Up until now no one was paying attention to the heating of mobile devices, mostly because CPU-heavy apps are very few now (may be only 3d games). AR application produce even more heat than 3d game and device could become quite hot. So heatsinks and heatpumps are on the agenda.
3. Camera. For AR the speed of the camera is more important than the resolution. Speed is the most important factor, slow camera produce blurred images which are extremely hard to process (extract features, edges etc)
Position of the camera. Most of the users are holding device horizontally while using AR. Specific of the mobile AR is that simultaneously user is getting input from the peripheral vision. To produce picture consistent with peripheral vision camera should be in the center of the device, not on the extreme edge like in N900.
Lack of skewing, off-center, radial and rolling shutter distortions of the camera is another factor. In this respect Nokia phone cameras are quite good for now, unlike iPhone.
4. Buttons. Touchscreen is not very helpful to AR, all screen real estate should be dedicated to the environment representation. While it’s quite possible to make completely gesture-driven AR interface buttons are still helpful. There should be at least one easily accessible button on the front panel. N95 with slider out to the right is the almost perfect setup – one big button on front panel and some on the slider on the opposite side. N900 with buttons only on the slider, slider sliding only down and no buttons on the front panel is the example of unhelpful buttons placement.
II. Software side
1. Fragmentation.
Platform fragmentation is the bane of mobile developers. Especially if several new models launched every quarter. One of the reasons of the phenomenal success of iPhone application platform is that there is no fragmentation whatsoever. Whit the huge zoo of models it practically impossible support all that are in the suitable hardware range. That is especially difficult with AR apps, which are closely coupled with camera technical specification, display size and ratio etc. If manufacturers want to make it easy for devs they should concentrate on one AR-friendly line of devices, with binary, or at least source code compatibility between models.
2. Easy access to DSP in API. It would effectively give developer a second CPU.
3. Access to raw data from camera. Why row data from camera are not accessible from ordinary API and only available to selected elite developer houses is a mistery to me. Right now, for example for Symbain OS camera viewfinder convert data to YUV422, from YUV422 to BMP and ordinary viewfinder API have access to BMP only. Quite overhead.
4. API to access internal camera parameters – focus distance etc. Otherwise every device have to be calibrated by developer.
9 Comments
Sorry, the comment form is closed at this time.
While this post is interesting and raises many good questions, there are a few things that are not correct: Accessing the camera preview data actually works quite well on Symbian (no need to go via file compression!). It is the iPhone which makes real problems here, having no camera API. Second: Even though it would be very interesting to access internals such as focus distance, cameras still have to be calibrated (principle point, focal length, radial distortion, etc.). Of course, that calibration data could be read out of the camera (would require the phone creator to calibrate it and store it inside the camera).
>Accessing the camera preview data actually works quite well on Symbian (no need to go via file compression!).
Wrong. Open API access only BMP from ViewfinderReady, which is converted by driver form YUV422/YUV41 (I actually written Jpeg instead of YUV422 – my mistake) There is no open access even to YUV422, and no access to raw data format.
>Second: Even though it would be very interesting to access internals such as focus distance, cameras still have to be calibrated
Not correct again, or at least not always correct. About focal length I have written, and all the rest – pinch, radial distortion are negligible (for Nokia phones at least) in my experience.
>Of course, that calibration data could be read out of the camera (would require the phone creator to calibrate it and store it inside the camera).
That is what I actually mean, but of cause I don’t mean that each individual phone have to be calibrated. Some factory specification is quite enough.
On the camera: Ok, yes the camera frame is delivered as BMP, but who cares? Accessing the camera image is easy and very efficient. Why is it a problem that you don’t get the YUV422? Getting the image into grayscale format would be faster, but you’d have to convert it to RGB for rendering anyways. On Windows CE one often gets the camera image in YUV12 format, but we did not notice any advantage or disadvantage over getting it as RGB565 – all these format conversions can be done very efficiently.
On the calibration: Actually, for perfect results every single camera needs to be calibrated separately. This could of course be done by the device creator, but I agree that it is overkill in practice.
I see considerable delay in getting BMP from viewfinder, and after that I’m converting it to greyscale. On the other hand getting greyscale from YUV422 is just one operation. There is definitely overhead here. And I’d like to get row camera format anyway – some details could be lost in the raw->YUV422 conversion. While those are high-frequency they could be helpful in phase correlation. Anyway I’d like to check it.
StartVideoCapture or something similar allows you to select a format – on N95 and Samsung I8910 HD it gives YUV12 (I think) – don’t have the code on this computer. Using the NEON processor in the i8910 I can convert that into both a greyscale for image processing and a RGB565 for display in under 2ms (320×240).
That’s what’s going on in the video at http://www.playar.net – it’s still 15fps, I haven’t optimised the later stages much yet.
Simon
Simon: Your tracker is pretty good. Which method have you implemented?
Hmm, I don’t use StartVideoCapture, I use viewfinder, which use ViewFinderFrameReady and have only bmp as parameter. MCameraObserver2 is not supported on N95.
Daniel: You probably recognise the target too :)
It’s my Histogrammed Intensity Patches stuff – published in BMVC. I think Gerhard showed you the BMVC paper (assuming you’re the Daniel I think you are…). Hoping to have some cool demos to show by WARM.
My academic homepage is here:
http://mi.eng.cam.ac.uk/~sjt59/hips.html
It’s doing detection every frame – I’m still clinging onto that concept despite the great results you’ve got with the patch tracker.
—-
In terms of the other topic of the thread, I guess there are two different categorisations possible – “model-based and non-model-based” or “marker and markerless”. I am doing “markerless model-based” detection, which means there is no bundle adjustment, a single frame contains all the information needed for a match, but a model of the target is required. However the target can be “natural” rather than fixed-markers, and it still works with partial occlusion.
—
N95 has MCameraObserver which still allows you to do a video capture. You can select up to 640×480 @ 30 fps on the N95 (good luck processing all those pixels though!). ViewFinderFrameReady seems to have a slower framerate.
Simon
Now, I know which Simon you are. :)
Congratulation on that paper. This is really cool stuff!
I didn’t know that you already had it running on the phone too.