Microsoft Research always comes up with cool research projects and demos. The latest project is called SemanticPaint which allows interactive 3D labeling and learning at your fingertips. It allows us to scan the scene using Kinect and it recognizes the objects available in the scene. In addition to that, an user can touch the objects and label them. The learning system makes use of these labels to automatically tag such items when found in a 3D scene in the future. This kind of 3D labeling and learning system will be useful in many scenarios. For example, for quickly gathering large numbers of labeled 3D environments for training large-scale visual recognition systems such as generating personalized environment maps with semantic segmentations to be used for the navigation of robots or partially sighted people; recognizing and segmenting objects for augmented reality games which ‘understand’ the player’s environment; and planning the renovation
of a building, automating inventory, and designing interiors.
We present a new interactive and online approach to 3D scene understanding. Our system, SemanticPaint, allows users to simultaneously scan their environment, whilst interactively segmenting the scene simply by reaching out and touching any desired object or surface. Our system continuously learns from these segmentations, and labels new unseen parts of the environment. Unlike offline systems, where capture, labeling and batch learning often takes hours or even days to perform, our approach is fully online. This provides users with continuous live feedback of the recognition during capture, allowing them to immediately correct errors in the segmentation and/or learning – a feature that has so far been unavailable to batch and offline methods. This leads to models that are tailored or personalized specifically to the user’s environments and object classes of interest, opening up the potential for new applications in augmented reality, interior design, and human/robot navigation. It also provides the ability to capture substantial labeled 3D datasets for training large-scale visual recognition systems.