Microsoft is currently topping the leaderboard in the race to develop Machine Reading – the ability for computers to read arbitrary text and extract meaning from it sufficient to answer questions regarding the content.
“We’re trying to develop what we call a literate machine: A machine that can read text, understand text and then learn how to communicate, whether it’s written or orally,” said Kaheer Suleman, the co-founder of Maluuba, a Quebec-based deep learning startup that Microsoft acquired earlier this year.
Microsoft’s teams are currently on top of the SQuAD leaderboard, which pits academic teams against each other to develop technology to read information from Wikipedia to test how well AI systems can answer questions about text passages.
Microsoft researchers and other industry and academic experts also are competing for the best results using another dataset, called MS MARCO, that uses real, anonymized data from Bing search queries to test a system’s ability to answer real questions from real users.
“We’re not just going to build a bunch of algorithms to solve theoretical problems. We’re using them to solve real problems and testing them on real data,” said Rangan Majumder, a partner group program manager within Microsoft’s Bing division. He’s working closely with the Redmond machine reading research team and led the development of the MS MARCO dataset.
An effective machine reading system could advance how search engines work. Instead of typing in a query and getting a list of blue links to sort through, an advanced machine reading system could respond in the same way a very knowledgeable person would when asked a question.
“There is a lot of information around the world, especially on the Internet,” Jianfeng Gao, a partner research manager in Microsoft’s Deep Learning Technology Center, said. “In order to make that useful, you need to turn information into knowledge. The technology that can bridge that gap is machine reading.”
“It delivers the information in a natural way,” said Gao.
Like many AI advances in the past few years, machine reading has benefited from the triad of better deep learning algorithms, a massive increase in cloud-based computing power to run those algorithms and huge amounts of data to learn and test on.
The researchers say those capabilities, along with advances in deep learning methods from work in areas like image and speech recognition, has gotten them to a point where they feel confident that significant breakthroughs in machine reading are on the horizon.
“It’s a long-term dream for researchers in natural language processing and even for artificial intelligence,” said Furu Wei, a lead researcher in the Natural Language Processing Group at Microsoft Research Asia.
“This is a small step toward the huge challenge of natural language understanding,” said Ming Zhou, assistant managing director of Microsoft Research Asia in Beijing, who leads the Natural Language Research Group.