Back in July, Microsoft announced that OpenAI Whisper model will be coming soon to Azure OpenAI Services. Last Friday, Microsoft announced that OpenAI Whisper model is now available for customers using Azure OpenAI service and Azure AI Speech service.
OpenAI whisper model is a neural network that can perform speech recognition and translation tasks in 57 languages. It is trained on a large and diverse dataset of audio and text collected from the web. It uses a simple end-to-end approach based on Transformer encoder-decoder architecture and it can produce transcripts with enhanced readability and phrase-level timestamps.
Enterprises can now build applications based on OpenAI Whisper model using following two ways:
OpenAI already offers Whisper API on its own. Using this new Azure OpenAI Service, developers can use the same OpenAI Whisper API in features and functionality, including transcription and translation capabilities. The Whisper model’s REST APIs for transcription and translation can be found in the Azure OpenAI Service portal.
Users of Azure AI Speech can now use the new OpenAI’s Whisper model in conjunction with the existing Azure AI Speech batch transcription API. Users of Whisper in Azure AI Speech benefit from existing features including async processing, speaker diarization, customization, and larger file sizes. Find the detailsbelow.
- Large file sizes: Azure AI Speech enhances Whisper transcription by enabling files up to 1GB in size and the ability to process large amounts of files by allowing you to batch up to 1000 files in a single request.
- Time stamps: Using Azure AI Speech, the recognition result includes word-level timestamps, providing the ability to identify where in the audio each word is spoken.
- Speaker diarization: This is another beneficial feature of Azure AI Speech that identifies individual speakers in an audio file and labels their speech segments. This feature allows customers to distinguish between speakers, accurately transcribe their words, and create a more organized and structured transcription of audio files.
- Customization/Finetuning (available soon): The Custom Speech capability in Azure Speech allows customers to finetune Whisper on their own data to improve recognition accuracy and consistency.