Microsoft Project Oxford’s Speaker and Video APIs are now available

BELLEVUE, WA - DECEMBER 3: Microsoft CEO Satya Nadella addresses shareholders during Microsoft Shareholders Meeting December 3, 2014 in Bellevue, Washington. The meeting was the first for Nadella as CEO. (Photo by Stephen Brashear/Getty Images)

Microsoft today launched its Speaker and Video APIs from its Project Oxford in public preview. The Speaker and Video APIs use machine learning to allow us to develop fascinating things. Microsoft states that the company’s Speaker Recognition provides” state-of-the-art algorithms that enables recognition of a human’s voice from audio streams. It comprises two components: speaker verification and speaker identification.”

  • Speaker Verification can automatically verify and authenticate users from their voice or speech. It is tightly related to authentication scenarios and is often associated with a pass phrase. Hence, we opt for text-dependent approach, which means speakers need to choose a specific pass phrase to use during both enrollment and verification phases.
  • Speaker Identification can automatically identify the person speaking in an audio file given a group of prospective speakers. The input audio is paired against the provided group of speakers, and in case there is a match found, the speaker’s identity is returned. It is text-independent, which means that there are no restrictions on what the speaker says during the enrollment and recognition phases.

The company also stated:

The Speaker Recognition APIs helps to recognize users and customers (speakers) from their voice. While Speaker Recognition APIs are not intended to be a replacement for stronger authentication tools, they do provide additional authentication measures to take security to a next level. Another way Speaker Recognition APIs can be used is to enhance customer service experience by automatically identifying the calling customer without the need for an agent to conduct a question and answer process to manually verify a customer’s identity.

Our goal with Speaker Recognition is to help developers build intelligent authentication mechanisms capable of balancing between convenience and fraud. Achieving such balance is no easy feat. Ideally, to establish identity, three pieces of information are needed:

  • Something you know (password or PIN).
  • Something you have (a secure key pad, mobile device or credit card).
  • Something you are (biometrics such as voice, fingerprint, face).

If you want to find out more about Project Oxford’s Speaker and Video APIs, head over to this link.