How Kinect depth sensor works – stereo triangulation?
Kinect use depth sensor produced by PrimeSense, but how exactly it works is not obvious from the first glance. Some person here, claimed to be specialist, assured me that PrimeSense sensor is using time-of-flight depth camera. Well, he was wrong. In fact PrimeSense explicitly saying they are not using time-of-flight, but something they call “light coding” and use standard off-the shelf CMOS sensor which is not capable extract time of return from modulated light.
Daniel Reetz made excellent works of making IR photos of Kinect laser emitter and analyzing it’s characteristics. He confirm PrimeSense statement – IR laser is not modulated. All that laser do is project static pseudorandom pattern of specs on the environment. PrimeSense use only one IR sensor. How it possible to extract depth information from the single IR image of the spec pattern? Stereo triangulation require two images to get depth of each point(spec). Here is the trick: actually there not one, but two images. One image is what we see on the photo – image of the specs captured by IR sensor. The second image is invisible – it’s a hardwired pattern of specs which laser project. That second image should be hardcoded into chip logic. Those images are not equivalent – there is some distance between laser and sensor, so images correspond to different camera positions, and that allow to use stereo triangulation to calculate each spec depth.
The difference here is that the second image is “virtual” – position of the second point y_2 is already hardcoded into memory. Because laser and sensor are aligned that make task even more easy: all one have to do is to measure horizontal offset of the spec on the first image relative to hardcoded position(after correcting lens distortion of cause).
That also explain pseudorandom pattern of the specs. Pseudorandom patten make matching of specs in two images more easy, as each spec have locally different neighborhood. Can it be called “structured light” sensor? With some stretch of definition. Structured light usually project grid of regular lines instead of pseudorandom points. At least PrimeSense object to calling their method “structured light”.
Sorry, the comment form is closed at this time.