What I would say to Nokia about mobile AR (if it would listen)
#augmentedreality
I have been struck off the list of the Nokia Augmented Reality co-creation session, so here is a gist of what I was intending to say about AR-friendly mobile devices.
I will not repeat obvious here (requirements for CPU, FPU, RAM etc.) but concentrate on things which are often missed.
I. Hardware side
1. Battery life is the most important thing here. AR applications are eating battery extremely fast – full CPU load, memory access, working camera and on top of it wireless data access, GPS and e-compass.
It’s not realistic to expect dramatic improvement in the battery life in near future, though fuel cells and air-fueled batteries give some hope. If one think short term the dual battery is the most realistic solution. AR-capable devices tend to be quite heavy and not quite slim anyway, so second battery will not make dramatic difference (iPhone could be exception here).
Now how to make maximum out of it? Make batteries hot-swappable with separate slots and provide separate battery charger. If user indoor he/she can remove empty battery and put it on charge while device is running on the second.
2. Heating. Up until now no one was paying attention to the heating of mobile devices, mostly because CPU-heavy apps are very few now (may be only 3d games). AR application produce even more heat than 3d game and device could become quite hot. So heatsinks and heatpumps are on the agenda.
3. Camera. For AR the speed of the camera is more important than the resolution. Speed is the most important factor, slow camera produce blurred images which are extremely hard to process (extract features, edges etc)
Position of the camera. Most of the users are holding device horizontally while using AR. Specific of the mobile AR is that simultaneously user is getting input from the peripheral vision. To produce picture consistent with peripheral vision camera should be in the center of the device, not on the extreme edge like in N900.
Lack of skewing, off-center, radial and rolling shutter distortions of the camera is another factor. In this respect Nokia phone cameras are quite good for now, unlike iPhone.
4. Buttons. Touchscreen is not very helpful to AR, all screen real estate should be dedicated to the environment representation. While it’s quite possible to make completely gesture-driven AR interface buttons are still helpful. There should be at least one easily accessible button on the front panel. N95 with slider out to the right is the almost perfect setup – one big button on front panel and some on the slider on the opposite side. N900 with buttons only on the slider, slider sliding only down and no buttons on the front panel is the example of unhelpful buttons placement.
II. Software side
1. Fragmentation.
Platform fragmentation is the bane of mobile developers. Especially if several new models launched every quarter. One of the reasons of the phenomenal success of iPhone application platform is that there is no fragmentation whatsoever. Whit the huge zoo of models it practically impossible support all that are in the suitable hardware range. That is especially difficult with AR apps, which are closely coupled with camera technical specification, display size and ratio etc. If manufacturers want to make it easy for devs they should concentrate on one AR-friendly line of devices, with binary, or at least source code compatibility between models.
2. Easy access to DSP in API. It would effectively give developer a second CPU.
3. Access to raw data from camera. Why row data from camera are not accessible from ordinary API and only available to selected elite developer houses is a mistery to me. Right now, for example for Symbain OS camera viewfinder convert data to YUV422, from YUV422 to BMP and ordinary viewfinder API have access to BMP only. Quite overhead.
4. API to access internal camera parameters – focus distance etc. Otherwise every device have to be calibrated by developer.
Symbian Signed again.
A discussion is going on at the symbian.org. It looks like a new symbain signed rules are in the work (my guess they will be implemented no earlier than symbian^4). Symbian signed may become cheaper and a new class of publisher ID may become available for anyone with a credit card.
Openness – Maemo vs Android
A great post from coll900 about comparative openness Maemo and Android for developers and users. Maemo designated as a clear win. The one point missing in the original post is a platform fragmentation. Android try to get around fragmentation using Java virtual machine (albeit with non-standard bytecodes). However native code will not be binary transferable between devices. That is especially relevant for augmented reality and other cpu-heavy apps. Here is a question – will Maemo be any better? For some mysterious reasons Nokia afflicted by irresistible drive to fragment it’s own software platform as much as possible. If Nokia manage to gather enough strength of will to keep Maemo on a single but mass-produced device line, like Apple with iPhone, Maemo could become developers dream and a serious competitor to iPhone. However if Nokia keeps its bad habit of producing zoo of semi-decent not-quite-compatible devices, with introduction of a new just-little-different device every quarter, just to break whatever compatibility still remaining, Maemo, with all its openess will not have practical advantage over Android.
Problems
During the tests I’ve found out that bundle adjustment is failing on some “bad frames”. There two ways to deal with it – reject bad frames or try to understand what happen – who set up us a bomb?
.Any problem is also an opportunity to understand subject better. For now I suspect Gauss-Newton is failing due to too big residue. Just adding Hessian to does not help – I’m getting negative eigenvalue. So now I’m trying quasi-Newton from the excellent book by Nocedal&Wright. If it will not help I’ll try hybrid Fletcher method.
Symbian Multimarker Tracking Library
#augmentedreality
Demo-version of binary Symbian multimarker tracking library SMMT available for download.
SMMT library is a SLAM multimarker tracker for Symbian. Library can work on Symbian S60 9.1 devices like Nokia N73 and Symbian 9.2 like Nokia N95, N82. It may also work on some other later versions. This version support only landscape 320×240 resolution for algorithmical reason – size used in the optimization.
This is slightly more advanced version of the tracker used in AR Tower Defense game.
PS corrupted file fixed
Another prospective AR device
Nokia RX-51. It reported having OMAP3 600Mhz CPU with hardware 3D, camera, phone connectivity – it’s not a pure tablet like N800, GPS, accelerometers, and most testy – Maemo 5 Linux
No reports of electronic compass though.
Augmented reality on S60 – basics
Blair MacIntyre asked on ARForum how to get video out of the Symbian Image data structre and upload it into OpenGL ES texture. So here how I did for my games:
I get viewfinder RGB bitmap, access it’s rgb data and use glTextureImage2D to upload it into background texture, which I stretch on the background rectangle. On top of the background rectangle I draw 3d models.
This code snipped for 320×240 screen and OpenGL ES 1+ (wordpress completly screwed tabs)
PS Here is binary static library for multimarker tracking for S60 which use that method.
#define VFWIDTH 320
#define VFHEIGHT 240
Two textures used for background, because texture size should be 2^n: 256×256 and 256×64
#define BKG_TXT_SIZEY0 256
#define BKG_TXT_SIZEY1 64
Nokia camera example could be used the as the base.
1. Overwrite ViewFinderFrameReady function
void CCameraCaptureEngine::ViewFinderFrameReady(CFbsBitmap& aFrame)
{
iController->ProcessFrame(&aFrame);
}
2. iController->ProcessFrame call CCameraAppBaseContaine->ProcessFrame
void CCameraAppBaseContainer::ProcessFrame(CFbsBitmap* pFrame)
{
// here RGB buffer for background is filled
iGLEngine->FillRGBBuffer(pFrame);
//and greyscale buffer for tracking is filled
iTracker->FillGreyBuffer(pFrame);
//traking
TBool aCaptureSuccess = iTracker->Capture();
//physics
if(aCaptureSuccess)
{
iPhEngine->Tick();
}
//rendering
glClear( GL_DEPTH_BUFFER_BIT);
iGLEngine->SetViewMatrix(iTracker->iViewMatrix);
iGLEngine->Render();
iGLEngine->Swap();
};
void CGLengine::Swap()
{
eglSwapBuffers( m_display, m_surface);
};
3. now how buffers filled: RGB buffers filled ind binded to textures
inline unsigned int byte_swap(unsigned int v) { return (v<<16) | (v&0xff00) | ((v >> 16)&0xff); }
void CGLengine::FillRGBBuffer(CFbsBitmap* pFrame)
{
pFrame->LockHeap(ETrue);
unsigned int* ptr_vf = (unsigned int*)pFrame->DataAddress();
FillBkgTxt(ptr_vf);
pFrame->UnlockHeap(ETrue); // unlock global heap
BindRGBBuffer(m_bkgTxtID0, m_rgbxBuffer0, BKG_TXT_SIZEY0);
BindRGBBuffer(m_bkgTxtID1, m_rgbxBuffer1, BKG_TXT_SIZEY1);
}
void CGLengine::FillBkgTxt(unsigned int* ptr_vf)
{
unsigned int* ptr_dst0 = m_rgbxBuffer0 +
(BKG_TXT_SIZEY0-VFHEIGHT)*BKG_TXT_SIZEY0;
unsigned int* ptr_dst1 = m_rgbxBuffer1 +
(BKG_TXT_SIZEY0-VFHEIGHT)*BKG_TXT_SIZEY1;
for(int j =0; j < VFHEIGHT; j++)
for(int i =0; i < BKG_TXT_SIZEY0; i++)
{
ptr_dst0[i + j*BKG_TXT_SIZEY0] = byte_swap(ptr_vf[i + j*VFWIDTH]);
}
ptr_vf += BKG_TXT_SIZEY0;
for(int j =0; j < VFHEIGHT; j++)
for(int i =0; i < BKG_TXT_SIZEY1; i++)
{
ptr_dst1[i + j*BKG_TXT_SIZEY1] = byte_swap(ptr_vf[i + j*VFWIDTH]);
}
}
void CGLengine::BindRGBBuffer(TInt aTxtID, GLvoid* aPtr, TInt aYSize)
{
glBindTexture( GL_TEXTURE_2D, aTxtID);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, aYSize, BKG_TXT_SIZEY0, 0,
GL_RGBA, GL_UNSIGNED_BYTE, aPtr);
}
4. Greysacle buffer filled, smoothed by integral image :
void CTracker::FillGreyBuffer(CFbsBitmap* pFrame)
{
pFrame->LockHeap(ETrue);
unsigned int* ptr = (unsigned int*)pFrame->DataAddress();
if(m_bIntegralImg)
{
// calculate integral image values
unsigned int rs = 0;
for(int j=0; j < VFWIDTH; j++)
{
// cumulative row sum
rs = rs+ Raw2Grey(ptr[j]);
m_integral[j] = rs;
}
for(int i=1; i< VFHEIGHT; i++)
{
unsigned int rs = 0;
for(int j=0; j = VFWIDTH)
{
m_integral[i*VFWIDTH+j] = m_integral[(i-1)*VFWIDTH+j]+rs;
}
}
}
iRectData.iData[0] = m_integral[1*VFWIDTH+1]>>2;
int aX, aY;
for(aY = 1; aY >2;
iRectData.iData[MAX_SIZE_X-1 + aY*MAX_SIZE_X] = Area(2*MAX_SIZE_X-2, 2*aY, 2, 2)>>2;
}
for(aX = 1; aX >2;
iRectData.iData[aX + (MAX_SIZE_Y-1)*MAX_SIZE_X] = Area(2*aX, 2*MAX_SIZE_Y-2, 2, 2)>>2;
}
for(aY = 1; aY < MAX_SIZE_Y-1; aY++)
for(aX = 1; aX >4;
}
}
else
{
if(V2RX == 2 && V2RY ==2)
for(int j =0; j < MAX_SIZE_Y; j++)
for(int i =0; i >2;
}
else
for(int j =0; j < MAX_SIZE_Y; j++)
for(int i =0; i UnlockHeap(ETrue); // unlock global heap
}
Background could be rendered like this
#define GLUNITY (1<<16)
static const TInt quadTextureCoords[4 * 2] =
{
0, GLUNITY,
0, 0,
GLUNITY, 0,
GLUNITY, GLUNITY
};
static const GLubyte quadTriangles[2 * 3] =
{
0,1,2,
0,2,3
};
static const GLfloat quadVertices0[4 * 3] =
{
0, 0, 0,
0, BKG_TXT_SIZEY0, 0,
BKG_TXT_SIZEY0, BKG_TXT_SIZEY0, 0,
BKG_TXT_SIZEY0, 0, 0
};
static const GLfloat quadVertices1[4 * 3] =
{
BKG_TXT_SIZEY0, 0, 0,
BKG_TXT_SIZEY0, BKG_TXT_SIZEY0, 0,
BKG_TXT_SIZEY0+BKG_TXT_SIZEY1, BKG_TXT_SIZEY0, 0,
BKG_TXT_SIZEY0+BKG_TXT_SIZEY1, 0, 0
};
void CGLengine::RenderBkgQuad()
{
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrthof(0, VFWIDTH, 0, VFHEIGHT, -1.0, 1.0);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glViewport(0, 0, VFWIDTH, VFHEIGHT);
glClear( GL_DEPTH_BUFFER_BIT);
glDisable(GL_BLEND);
glDisable(GL_ALPHA_TEST);
glDisable(GL_DEPTH_TEST);
glDisable(GL_CULL_FACE);
glColor4x(GLUNITY, GLUNITY, GLUNITY, GLUNITY);
glBindTexture( GL_TEXTURE_2D, m_bkgTxtID0);
glVertexPointer( 3, GL_FLOAT, 0, quadVertices0 );
glTexCoordPointer( 2, GL_FIXED, 0, quadTextureCoords );
glDrawElements( GL_TRIANGLES, 2 * 3, GL_UNSIGNED_BYTE, quadTriangles );
glBindTexture( GL_TEXTURE_2D, m_bkgTxtID1);
glVertexPointer( 3, GL_FLOAT, 0, quadVertices1 );
glTexCoordPointer( 2, GL_FIXED, 0, quadTextureCoords );
glDrawElements( GL_TRIANGLES, 2 * 3, GL_UNSIGNED_BYTE, quadTriangles );
glEnable(GL_CULL_FACE);
glEnable(GL_BLEND);
glEnable(GL_DEPTH_TEST);
glEnable(GL_ALPHA_TEST);
}
Marker vs markerless (bundle adjustment)
#augmentedreality
Here is a sample of image registration with fiduciary marker (actually the marker I used in my games) vs registration with bundle adjustment. Blue lines are points heights (relatively to marker plane) calculated using marker registration and triangulation. White lines are the same using bundle adjustment(modified). Points extracted with multiscale FAST and fitted with log-polar Fourier descriptors for correspondence (actually SURF descriptor produce the same correspondence).

As you can see markerless is in no way worse then markers, at least on this example ))).
Computer vision accelerator in FPGA for smartphone
#augmentedreality
Tony Chun form Intel integrated platform research group talk about “methodology” of putting computer vision algorithm(or speech recognition) into hardware. He specifically mention smartphone and mobile augmented reality. Tony suggest that this accelerator should be programmable, with some software language to make it flexible. It’s not clear if he is talking about FPGA prototype, or putting FPGA into smartphone. Idea to use FPGA chip for mobile CV task is not new, for example in this LinkedIn discussion Stanislav Sinyagin suggested some specific hardware to play with.
Thanks artimes.rouli.net for pointing this one.
Augmented Reality on Android – now with NDK
With release of native code kit Android now looks more like a functional AR platform. NDK allow for native C/C++ libraries, and complete application seems need java wrapper still. It’s not clear to me still how accessible are video and OpenGL API from NDK – have to look into it.
On related note – there are rumors about pretty powerful 1Ghz phone for Android 2.0