Last month, we reported about an upcoming Microsoft Garage service called Video Breakdown. Today, Microsoft officially announced the project and it is now open for everyone to try.
When you upload a video to the Video Breakdown service, video file’s content undergoes analysis through a series of Microsoft Cognitive Services and Azure Media Analytics as well as other Azure services (Azure Websites, Azure Blob storage, Azure Search, Azure Media Services). The process produces an audio transcript; face tracking, grouping and identification; differentiation of speakers; optical character recognition; and extracts sentiments and topics. If you search for a video now, you will get search results based on the file name or the metadata associated with the videos. But with Video Breakdown, you will get search results even if the search keyword is mentioned by a speaker in the video or it was shown in a slide, etc.
Key features of Video Breakdown:
Linguistic Transcript – Convert audio to text based on acoustic language models
Face Detection – Find when does each face appears in the video
Speaker Diarization – Map and understand who spoke when
OCR – Extract text that appears in video as overlay, slides or background
Face Identification – Understand who is the person that appears as face
Voice Activity Detection – Separate background noise and voice activity
Contextual Search – Understand the context of search results
Sentiment Analysis – Understand the level of positive vs negative spoken or written content