Last month, Microsoft announced the roll out of real-time background AI-based noise suppression feature to Microsoft Teams Windows desktop users. This feature can suppress unnecessary noise like shuffling papers, slamming doors and barking dogs during a Teams call. AI-based noise suppression works by analyzing an individual’s audio feed and using specially trained deep neural networks to filter out the noise and retain only the speech signal. Microsoft today announced that it is working to bring AI-based noise suppression to Microsoft Teams on Mac and mobile platforms.
Microsoft today also explained how they developed this feature without using actual customer data. Microsoft optimized the deep learning model in a way that it could run efficiently on the Teams desktop client in real-time without much overhead.
To achieve this dataset diversity, we have created a large dataset with approximately 760 hours of clean speech data and 180 hours of noise data. To comply with Microsoft’s strict privacy standards, we ensured that no customer data is being collected for this data set. Instead, we either used publicly available data or crowdsourcing to collect specific scenarios. For clean speech we ensured that we had a balance of female and male speech and we collected data from 10+ languages which also include tonal languages to ensure that our model will not change the meaning of a sentence by distorting the tone of the words. For the noise data we included 150 noise types to ensure we cover diverse scenarios that our customers may run into from keyboard typing to toilet flushing or snoring. Another important aspect was to include emotions in our clean speech so that expressions like laughter or crying are not suppressed. The characteristics of the environment from which our customers are joining their online Teams meetings has a strong impact on the speech signal as well. To capture that diversity, we trained our model with data from more than 3,000 real room environments and more than 115,000 synthetically created rooms.
Since we use deep learning it is important to have a powerful model training infrastructure. We use Microsoft Azure to allow our team to develop improved versions of our ML model. Another challenge is that the extraction of original clean speech from the noise needs to be done in a way that the human ear perceives as natural and pleasant. Since there are no objective metrics which are highly correlated to human perception, we developed a framework which allowed us to send the processed audio samples to crowdsourcing vendors where human listeners rated their audio quality on a one to five-star scale to produce mean opinion scores (MOS). With these human ratings we were able to develop a new perceptual metric which together with the subjective human ratings allowed us to make fast progress on improving the quality of our deep learning models.
Here’s how you can enable noise suppression feature in Teams:
- Select your profile picture at the top right of Teams and then select Settings.
Select Devices on the left and then, under Noise suppression, select an option.
- From Meeting window:
- Select More options in your meeting controls and then select Device settings.
Under Noise suppression, select an option.