Microsoft revealed the Cortana digital assistant platform at BUILD developer conference earlier this month. Behind the scenes, its Microsoft’s Bing platform that powers Cortana. Even though Cortana responds based on your intent and behavior now, MSR expects it to interact in an increasingly anticipatory, natural manner in the future. Microsoft Research detailed some of the technologies involved in building Cortana. Even though many of us would just categorize them under Artificial Intelligence label, Microsoft Research has detailed all of those technologies involved.
- Speech recognition,
- Semantic/natural language processing,
- Dialogue modeling between human and machines,
- Spoken-language generation,
Microsoft Research has used all of the above base technologies for a creating a virtual personal assistant based on the work they have done in different areas of personal-assistant technology.
Cortana’s design philosophy is therefore entrenched in state-of-the-art machine-learning and data-mining algorithms. Furthermore both developers and researchers are able to use Microsoft’s broad assets across commercial and enterprise products, including strong ties to Bing web search and Microsoft speech algorithms and data.
If Heck has set the bar high for Cortana’s future, it’s because of the deep, varied expertise within Microsoft Research.
“Microsoft Research has a long and broad history in AI,” he says. “There are leading scientists and pioneers in the AI field who work here. The underlying vision for this work and where it can go was derived from Eric Horvitz’s work on conversational interactions and understanding, which go as far back as the early ’90s. Speech and natural language processing are research areas of long standing, and so is machine learning. Plus, Microsoft Research is a leader in deep-learning and deep-neural-network research.”
What you saw with Cortana in Windows Phone 8.1 is just the beginning of the years of journey.
“Microsoft has intentionally built Cortana to scale out to all the different domains,” Heck says. “Having a long-term vision means we have a long-term architecture. The goal is to support all types of human interaction—whether it’s speech, text, or gestures—across domains of information and function and make it as easy as a natural conversation.”
Read about it in full detail here.