Amazon publishes speech comprehension dataset in 51 languages

Amazon has published under CC BY 4.0 the MASSIVE (Multilingual Amazon SLURP for Slot Filling, Intent Classification, and Virtual-assistant Evaluation) dataset, machine learning models, and a toolkit for training custom models that can be used to understand information on natural language (NLU, Natural Language Understanding). The set includes over a million annotated and classified text statements prepared for 51 languages.

As a reference for building the MASSIVE set, the SLURP collection, originally available for English, was used, which was localized into 50 other languages ​​with the involvement of professional translators. Alexa's natural language comprehension (NLU) technology first converts speech into text, then applies several NLU models to the text that analyze the presence of keywords to determine the essence of the user's question.

One of the goals of creating and publishing the set is to adapt voice assistants to process information in several languages ​​at once, as well as to encourage third-party developers to create applications and services that expand the capabilities of voice assistants. To attract the attention of developers, Amazon has established a competition to create the best generic model using the published dataset.

Currently, voice assistants support only a few languages ​​and use language-specific machine learning models. The MASSIVE project aims to eliminate this shortcoming by creating universal models and machine learning systems that can parse and process information in different languages.

Source: opennet.ru

Add a comment