Sunday, November 20, 2011

Gotchas with the Kinect SDK for Windows

Playing with the Kinect SDK for Windows, and having a ball, but the doco is (understandably) a bit rubbish in places, or to be more specific – lacks critical details around the form that a parameter takes, where that detail is important.

Anyway, this is my list of gotchas so far:

Depth Data Inverted when Player Index tracking enabled

Bizarrely, whether you initialize and open your depth image stream with ImageType.Depth or ImageType.DepthAndPlayerIndex makes the difference between whether what you get is ‘right way round’ or horizontally inverted.

Inverted is generally more useful, because it matches with the ‘mirror image’ video stream. So why isn’t the stream like that always? Seems like an unnecessary inconsistency to me, and one you might want to spell out in the doco.

Different Depth Data Pixel Values when Player Index Tracking Enabled

When you do turn player index tracking on, the depth stream ‘pixels’ are lshifted 3 positions, leaving the lower 3 bits for the player index. This is documented, and I understand you’ve got to put the player index somewhere, but why not make the format consistent in both cases, and just leave the lower bits zero if tracking not enabled? Better still, why not put the (optional) player index in the high bits?

This is especially irritating because...

GetColorPixelCoordinatesFromDepthPixel() Requires Bit-Shifted Input

The nuiCamera.GetColorPixelCoordinatesFromDepthPixel() mapping method expects the ‘depthValue’ parameter to be in the format it would have been if you had player tracking enabled. If you don’t, you’ll have to lshift 3 places to the left yourself, just to make it work. So depending on how you setup the runtime, the pixels from one part of the API can or can’t be passed to another part of the API. That’s poor form, if you ask me.

Not that you’ll find that in the doco of course, least of all the parameter doco.

No GetDepthPixelFromColorPixelCoordinates Method

Ok, so I understand that the depth to video coordinate space translation is a lossy one, but I still don’t see why this method doesn’t exist.

I picked up the Kinect SDK and the first thing I wanted to do was depth-clipping background removal. And the easy way to do this is to loop through the video pixels, and for each find the corresponding depth pixel and see what its depth was. And you can’t do that.

Instead you have to loop through the depth pixels and call the API method to translate to video pixels, but because there are less of them compared to the video pixels, you have to paint them out as a 2x2 block, and even then there’ll be lots of video pixels you don’t processes, so many you have to run the loop twice: once to set all the video pixels to some kind of default state, and once for those that map to depth pixels to put the depth ‘on’.

Just didn’t feel right.

Popular Posts