Microsoft's mission is to empower every person and organization on the planet to achieve more. The media industry is a great example of making this mission a reality. We live in an age where more content is being created and consumed, in more ways and on more devices. At IBC 2019, we shared the latest innovations we're working on and how they can help transform your media experience.
Details under the cut!
This page on
Video Indexer now supports animation and multilingual content
Last year at IBC, we made publicly available our award-winning
Our latest offerings include previews of two highly requested and differentiated features, Animated Character Recognition and Multilingual Voice Transcription, as well as several additions to the existing models available today in Video Indexer.
Animated Character Recognition
Animated content, cartoons are one of the most popular types of content, but standard machine vision models built for human face recognition don't work well with it, especially if the content contains characters without human features. The new preview combines Video Indexer with Microsoft's Azure Custom Vision service to provide a new set of models that automatically detect and group animated characters and make them easy to mark up and recognize with integrated custom vision models.
Models are integrated into a single pipeline, allowing anyone to use this service without any knowledge of machine learning. The results are available through the no-code Video Indexer portal, or through the REST API for quick integration into your own applications.
We created these animated character models with some customers who provided real animated content for training and testing. The value of the new functionality was well described by Andy Gutteridge, senior director of studio technology and post-production at Viacom International Media Networks, who was one of the data providers: βAdding a robust AI-based animated content detection feature will allow us to quickly and efficiently find and catalog character metadata from our library. content.
Most importantly, it will give our creative teams the ability to instantly find the content they need, minimize media management time, and allow them to focus on creativity.β
You can get started with the recognition of animated characters with
Identification and transcription of content in multiple languages
Some media resources, such as news, timelines and interviews, contain recordings of people speaking different languages. Most existing speech-to-text translation capabilities require you to specify the audio recognition language first, which makes it difficult to transcribe multilingual videos.
Our new automatic language detection feature for different types of content uses machine learning technology to identify languages ββfound in media resources. Once detected, each language segment automatically goes through a transcription process in the corresponding language, and then all segments are combined into a single transcription file consisting of several languages.
The resulting transcript is available as part of the Video Indexer JSON output and as subtitle files. Output transcription is also integrated with Azure Search, allowing you to instantly search your videos for different language segments. In addition, multilingual transcription is available when working with the Video Indexer portal, so you can view the transcript and identified language by time, or jump to specific places in the video for each language and see the multilingual transcription as captions while the video is playing. You can also translate the received text into any of the 54 available languages ββthrough the portal and API.
Learn more about the new Multi-Language Content Recognition feature and how it's used in Video Indexer
Additional updated and improved models
We are also adding new models to the Video Indexer and improving existing models, including those described below.
Extracting entities related to people and places
We've expanded our brand detection capabilities to include well-known names and locations such as the Eiffel Tower in Paris and Big Ben in London. When they appear in the generated transcript or on the screen when using Optical Character Recognition (OCR), the appropriate information is added. With this new feature, you can search through all the people, places, and brands that appeared in the video and view details about them, including timeslots, descriptions, and links to the Bing search engine for more information.
Frame detection model for editor
This new feature adds a set of "tags" to the metadata attached to individual frames in the details JSON to represent their editorial type (e.g. wide frame, medium frame, close-up, extra close-up, two shots, multiple people, outdoor, indoors, etc.). These frame type characteristics are useful when editing videos for clips and trailers, or when looking for a specific frame style for artistic purposes.
Advanced IPTC mapping granularity
Our theme detection model determines the theme of a video based on transcription, optical character recognition (OCR) and detected celebrities, even if the theme is not explicitly specified. We map these discovered topics to four classification areas: Wikipedia, Bing, IPTC, and IAB. This enhancement allows us to enable the second level IPTC classification.
Taking advantage of these improvements is as easy as reindexing your current Video Indexer library.
New live streaming functionality
We're also introducing two new features for live streaming in Azure Media Services preview.
Real-time AI transcription takes live streaming to the next level
Using Azure Media Services for live streaming, you can now receive an output stream that includes an auto-generated text track in addition to audio and video content. The text is generated using real-time audio transcription based on artificial intelligence. Custom methods are applied before and after speech-to-text to improve results. The text track is packaged in IMSC1, TTML, or WebVTT, depending on whether it is shipped in DASH, HLS CMAF, or HLS TS.
Real-time line coding for 24/7 OTT channels
Using our v3 APIs, you can create, manage, and live stream over-the-top (OTT) channels, as well as all other features of Azure Media Services, such as live video-on-demand (VOD, video on demand), packaging and digital rights management (DRM, digital rights management).
For previews of these features, visit
New packaging options
Audio track description support
Broadcast content often has an audio track with verbal explanations of what is happening on the screen in addition to the regular audio. This makes programs more accessible to visually impaired viewers, especially if the content is primarily visual. New
Inserting ID3 metadata
To signal the insertion of ads or custom metadata events to the client's player, broadcasters often use time-based metadata embedded in videos. In addition to the SCTE-35 signaling modes, we now also support
Microsoft Azure partners demonstrate end-to-end solutions
international company
Source: habr.com