Microsoft Research beats rivals with best ever Speech Recognition benchmark score


On the Microsoft Blog Microsoft Research has announced that their AI efforts has hit a new milestone, achieving an industry-leading score of 6.3% Word Error Rate on a standardized speech recognition test, the Switchboard speech recognition task.

“Our best single system achieves an error rate of 6.9% on the NIST 2000 Switchboard set. We believe this is the best performance reported to date for a recognition system not based on system combination. An ensemble of acoustic models advances the state of the art to 6.3% on the Switchboard test data,” the scientist noted in a research paper.

The ultimate goal is to recognize speech as well as any other person, which would make voice assistants such as Cortana even more useful.

“It’s a simple concept, yet it’s very powerful in its impact.  It is about taking the power of human language and applying it more pervasively to all of our computing,” Nadella said at an event earlier this year.

Geoffrey Zweig, principal researcher and manager of Microsoft’s  Speech & Dialog research group,  led the Switchboard speech recognition effort.  He attributes the company’s industry-leading speech recognition results to the skills of its researchers, which led to the development of new training algorithms, highly optimized convolutional and recurrent neural net models, and the development of tools like the Computational Network  Toolkit.  CNTK implements sophisticated optimizations that enable deep learning algorithms to run an order of magnitude faster than before. A key step forward was a breakthrough for parallel training on graphics processing units, or GPUs..

“The research team we’ve assembled brings to bear a century of industrial speech R&D experience to push the state of the art in speech recognition technologies,” Zweig said.

“This new milestone benefited from a wide range of new technologies developed by the AI community from many different organizations over the past 20 years,” said Xuedong Huang, Microsoft’s chief speech scientist.

Earlier this year Microsoft researchers won the ImageNet computer vision challenge. The technology has found its home in a number of Microsoft products, including the viral app.