Microsoft's speech recognition tech is now more accurate than ever before

Home » News

Mehedi Hassan

Tech Reporter

cortana

2 min. read

Updated on November 8, 2024

Microsoft’s speech recognition technology just hit human-level accuracy. The company’s research team announced that its speech recognition system now has a Word Error Rate (WER) of only 5.1%. That’s down from the system’s previous WER of 5.9%. Redmond has been constantly advancing its speech recognition system over the year and has been able to bring it down to 5.1% from the 6.3% WER it achieved back in September of last year. The company was been able to reduce its error rate by a whopping 12% over the last year.

Previous reports have shown that the human word error rate is currently at 5,1%, which means Microsoft’s speech recognition system effectively is as accurate as humans. That’s a pretty huge achievement on Microsoft’s part, as it’s been trying to reach human parity for the last 25 years.

Redmond detailed how it’s achieving the lower error rate using a combination of a convolutional neural network and bidirectional long-short-term memory on a technical report. Engineers at Microsoft have also been working on improving its neural net-based acoustic and language models, contributing to the improved word error rate. The company also claims its investment in the cloud business has enabled a faster training process for its acoustic and language models.

Microsoft’s speech recognition technology used across Windows, Cortana, Office, Cognitive Services, and the improved accuracy will likely benefit almost all of its customers in the coming months.