Microsoft Garage’s New Video Breakdown Service Officially Announced




Last month, we reported about an upcoming Microsoft Garage service called Video Breakdown. Today, Microsoft officially announced the project and it is now open for everyone to try.

When you upload a video to the Video Breakdown service,  video file’s content undergoes analysis through a series of Microsoft Cognitive Services and Azure Media Analytics as well as other Azure services (Azure Websites, Azure Blob storage, Azure Search, Azure Media Services). The process produces an audio transcript; face tracking, grouping and identification; differentiation of speakers; optical character recognition; and extracts sentiments and topics. If you search for a video now, you will get search results based on the file name or the metadata associated with the videos. But with Video Breakdown, you will get search results even if the search keyword is mentioned by a speaker in the video or it was shown in a slide, etc.

Key features of Video Breakdown:

  • Linguistic Transcript – Convert audio to text based on acoustic language models

  • Face Detection – Find when does each face appears in the video

  • Speaker Diarization – Map and understand who spoke when

  • OCR – Extract text that appears in video as overlay, slides or background

  • Face Identification  – Understand who is the person that appears as face

  • Voice Activity Detection – Separate background noise and voice activity

  • Contextual Search – Understand the context of search results

  • Sentiment Analysis – Understand the level of positive vs negative spoken or written content

You can try out Video Breakdown (Preview) here. Read more about this app here.

