Microsoft has released a massive database to 100,000 question and answer pairs written by humans to help AI researchers train their machines to extract information better from websites and respond more naturally to questions asked by users.
The Microsoft Machine Reading Comprehension (MS MARCO) is being offered under an open source license and features 100,000 questions culled from Bing queries and 200,000 answers written by humans drawn from real documents or websites summarized by humans.
“The team chose the anonymized questions based on the queries they thought would be more interesting to researchers. In addition, the answers were written by humans, based on real web pages, and verified for accuracy,” said a Microsoft.
The eventual aim is for digital assistants to provide proper answers to even complex questions rather than defaulting to a list of web links as currently when Cortana or Siri is stumped.
MS MARCO is available to download for free for non-commercial use with a commercial version available for companies and researchers.
More information can be found at Microsoft’s site here.