Mirror Image

Mostly AR and Stuff

How Kinect depth sensor works – stereo triangulation?

Kinect use depth sensor produced by PrimeSense, but how exactly it works is not obvious from the first glance. Some person here, claimed to be specialist, assured me that PrimeSense sensor is using time-of-flight depth camera. Well, he was wrong. In fact PrimeSense explicitly saying they are not using time-of-flight, but something they call “light coding” and use standard off-the shelf CMOS sensor which is not capable extract time of return from modulated light.
Daniel Reetz made excellent works of making IR photos of Kinect laser emitter and analyzing it’s characteristics. He confirm PrimeSense statement – IR laser is not modulated. All that laser do is project static pseudorandom pattern of specs on the environment. PrimeSense use only one IR sensor. How it possible to extract depth information from the single IR image of the spec pattern? Stereo triangulation require two images to get depth of each point(spec). Here is the trick: actually there not one, but two images. One image is what we see on the photo – image of the specs captured by IR sensor. The second image is invisible – it’s a hardwired pattern of specs which laser project. That second image should be hardcoded into chip logic. Those images are not equivalent – there is some distance between laser and sensor, so images correspond to different camera positions, and that allow to use stereo triangulation to calculate each spec depth.
The difference here is that the second image is “virtual” – position of the second point y_2 is already hardcoded into memory. Because laser and sensor are aligned that make task even more easy: all one have to do is to measure horizontal offset of the spec on the first image relative to hardcoded position(after correcting lens distortion of cause).
That also explain pseudorandom pattern of the specs. Pseudorandom patten make matching of specs in two images more easy, as each spec have locally different neighborhood. Can it be called “structured light” sensor? With some stretch of definition. Structured light usually project grid of regular lines instead of pseudorandom points. At least PrimeSense object to calling their method “structured light”.

About these ads

30, November, 2010 - Posted by | Uncategorized


  1. Now the question is, how can this technique be used to provide some sort of compressive sensing measurement ?


    Comment by Igor Carron | 30, November, 2010

  2. I don’t see how it can be related to CS directly – the measurements are completely deterministic – there is no measurement mixing here, only one laser and only one off-the shelf sensor. So the question is – what kind of non-trivial effects can you get with one non-modulated laser and one standard IR sensor?

    Comment by mirror2image | 30, November, 2010

  3. It’s not direct measurement, I am thinking more along the lines of using this set up for a purpose which it is not intended for.

    Still thinking…


    Comment by Igor Carron | 30, November, 2010

  4. In that case think about adding mirror(s) and/or second Kinect to setup :) I’m not good in wave optic – can you get interference pattern with that kind of equipment?

    Comment by mirror2image | 30, November, 2010

  5. [...] than only measuring the color value as RGB we know from a normal webcam). Update: it actually uses stereo triangulation with the help of an IR pattern. Why is this piece of hardware so great? Because it’s so damn cheap and starts nothing less [...]

    Pingback by Introducing the Kinect | augmented.org | 13, December, 2010

  6. There is certainly a distance between the laser and sensor, but the sensor is the only ‘eye’. the laser cant read incoming signals.the RGB cam in kinect is ‘optional’ and doesn’t aid in 3D image recontruction.. and if there is a hardwired pattern, it cannot be considered as the second image..

    Comment by Punit | 16, March, 2011

  7. @Punit you didn’t understood what I wrote probably. Of cause laser don’t read anything. Pattern from memory *is* the second image used in triangulation and the first image is the image from IR sensor. For triangulation does not matter if it originate from sensor or memory. The only data used in triangulation are coordinates of feature points and correspondence between two set of points. The first set of feature points is hardcoded pattern and the second set is obtained from IR image of the sensor.

    Comment by mirror2image | 16, March, 2011

  8. I think there is no needof a second image, but only to know the line of each projected point. We need to be able to identify each single point (line) projected, and this I think is obtained by the so called light-coding. (probably each line projected has a unique code of pulsed light). The sensor will capture the point that are intersection of those line with objects, so each point in sensor image will pulse in a unique way (the specific light-pulsed-code of that line), so we can recognize it and identify its line of projection. So, for each line, we know the line in 3d space (fixed) and we know the point of intersection showed in 2D image: the distance can be computed (each line has its own equation, differing from others only by some parameter, I think). This is the way I think it should work. Maybe you intended this by calling it “hardwired image” ? Very interesting topic, excuse my weak english, bye :)

    Comment by Venom | 2, April, 2011

  9. [...]  http://mirror2image.wordpress.com/2010/11/30/how-kinect-works-stereo-triangulation/ [...]

    Pingback by Research 2011 Summer · 차세대 미디어 기술 소스 정리 (도지원) | 23, July, 2011

  10. Please also note the interesting discussion present in this site:

    Comment by Richard | 27, July, 2011

Sorry, the comment form is closed at this time.


Get every new post delivered to your Inbox.

%d bloggers like this: