MonoFusion is a Microsoft Research project that uses a camera that could be already available in a tablet or a phone for creating 3-D scans of arbitrary environments in real time, utilizing only a single RGB camera as the input sensor. You should note down that there is no additional input hardware like Depth sensor is required for this technology. If Microsoft uses this technology in Lumia devices, in seconds, a user can generate a compelling 3-D model, which can be used in augmented reality, for 3-D printing, or in computer-aided design.
MonoFusion allows a user to build dense 3D reconstructions of their environment in real-time, utilizing only a single, off-the-shelf web camera as the input sensor. The camera could be one already available in a tablet, phone, or a standalone device. No additional input hardware is required. This removes the need for power intensive active sensors that do not work robustly in natural outdoor lighting. Using the input stream of the camera we first estimate the 6DoF camera pose using a sparse tracking method. These poses are then used for efficient dense stereo matching between the input frame and a key frame (extracted previously). The resulting dense depth maps are directly fused into a voxel-based implicit model (using a computationally inexpensive method) and surfaces are extracted per frame. The system is able to recover from tracking failures as well as filter out geometrically inconsistent noise from the 3D reconstruction. Our method is both simple to implement and efficient, making such systems even more accessible. This paper details the algorithmic components that make up our system and a GPU implementation of our approach. Qualitative results demonstrate high quality reconstructions even visually comparable to active depth sensor-based systems such as KinectFusion.