A Look At Challenges In Creating The Next Generation Kinect For The Xbox One

6 min. read

Published on October 3, 2013

published on October 3, 2013

Readers help support MSpoweruser. We may get a commission if you buy through our links.

I think Playstation is going to seriously regret not bundling their version of Kinect with the PS4. Maybe they knew it would not be as good as Kinect 2.0 anyways.

Cyrus Bamji, Microsoft partner hardware architect for Microsoft’s Silicon Valley-based Architecture and Silicon Management group, and members of his team were trying to incorporate a time-of-flight camera into Xbox One.

A time-of-flight camera emits light signals and then measures how long it takes them to return. That needs to be accurate to 1/10,000,000,000 of a second; the speed of light. With such measurements, the camera is able to differentiate light reflecting from objects in a room and the surrounding environment. That provides an accurate depth estimation that enables the shape of those objects to be computed.

That speed-of-light capability would be a major advancement for the Kinect sensor portion of Xbox One, being released to 13 launch markets next month. The new Kinect, a key differentiator for Xbox One against its competition, needed to capture a larger field of view with greater accuracy and higher resolution. An infrared sensor will enable object identification requiring little to no light, and improved hand-pose recognition, giving gamers and more casual users the ability to control the console with their hands.

“When we take a relatively new technology, such as time-of-flight, and put it into a commercial product, there are a whole bunch of things that happen,” he says. “There are things that we didn’t know how important they were until the product was made. For example, we know theoretically that motion blur in time of flight is a big problem, but just how important is only discoverable when you’re building a product with it and that product needs to deliver an excellent experience.”

Accurate depth measurement in diverse scenes with the new camera’s high resolution and a wider field of view also pose user-experience issues, making it difficult to keep small objects, such as a finger, from fading into the background, for instance. While those features delivered more versatile device performance, they also created issues of their own in real-life scenarios, such as the need for accurate depth measurement in diverse, high-resolution scenes. That, as well as improving the wider field of view and the motion blur, required clean data—quickly. Xbox One had to be ready for the 2013 holiday season.

The analog nature of the time-of-flight data posed challenges to delivering such a solution.

“The time-of-flight data coming out of our sensor is per pixel, per frame, and there is a lot more analog information,” Acharya says. “Another issue was that the foreground objects close to the background objects would melt into the background—again, due to the analog nature of how our sensor provides the depth data for pixels that land on edges.”

“This resulted in a lot of information, and to make it easier for foreground/background extraction and scene segmentation, use by software and game developers, the requirement was to clean up this data simultaneously by adding software algorithms in the pipe, yet without incurring a performance hit. This was crucial. We started with various work streams and, in the end, settled on making optimization to the parameters in the system to overcome the issue.”

The collaborators wanted to deliver a clear separation of foreground and background even if the objects are close to each other. That, too, proved difficult. And then there was motion blur.

“Motion blur,” Acharya explains, “is a parameter that needs to be minimized and is not technology-specific. The time-of-flight camera uses global shutter, which has helped reduce motion blur significantly—from 65 milliseconds in the original Kinect to fewer than 14 milliseconds now.”

Other challenges presented themselves. For one thing, processing time became an issue. In the academic literature about time-of-flight systems, processing time wasn’t an issue. In the laboratory environment, the technology worked fine. But Xbox One needs to process a whopping 6.5 million pixels per second. And only a small part of Xbox One’s computing power could be harnessed for this task. The lion’s share is reserved, understandably, for essentials such as gaming, skeleton tracking, face recognition, and audio.

“You need to do very, very light computation for each pixel,” Krupka says, “and this is one of the things that made the problem challenging and different from the typical approach in the academic literature in this field.”

Remarkably, it all came together, and that means that while entertainment lovers worldwide will soon find themselves delighted by the Xbox One experience, so, too, will those eager to develop for the platform. Reducing that edge-data noise makes the data developer-ready, and being able to segment clearly between the foreground and the background solves a complex computational problem. The data is clean, and it can be absorbed more easily by game developers.

Another fascinating feature of the Kinect sensing device in Xbox One stems from its infrared sensor, which can identify objects in a completely darkened room. It can recognize people and track bodies even without any light visible to the naked eye. It can identify a hand pose from four meters away, see the fingers of a child, and remember your identity even minus room illumination.

The wider field of view makes it possible for more players to play an Xbox One game at the same time. With the new console, as many as six players can crowd into one scene. A tall adult can play with a small child without either being squeezed out of the picture. Users get a better experience if they’re standing close by, farther away, or on the periphery of the room.

And the improved hand-pose recognition enables users to interact with the Xbox One just by using their hands—no controller necessary. Thanks to the infrared camera, hand activities can be identified at any illumination or with none at all. Prior hand-pose solutions were able to deliver speed or accuracy, but not both. The hand-pose solution jointly devised by the Xbox team and Microsoft Research can do both.

Source: The Official Microsoft Blog

Leave a Reply