The world's largest open source translation model! Produced by Meta, supports 100 voices and languages!

A professional community focusing on the field of AIGC, paying attention to the development and application of large language models (LLM) such as OpenAI and Baidu Wenxinyiyan, and paying attention to the benchmark evaluation and market research of LLM. Welcome to follow!

On August 23, global social networking and technology giant Meta (the parent company of Facebook, Instagram, etc.) announced on its official website that it will open source multi-speech, language, translation, and transcription large model SeamlessM4T. (Open source address: https://github.com/facebookresearch/seamless_communication)

According to Meta, SeamlessM4T is the first large-scale integrated AI translation model that supports 100 speech and language translations and can perform multi-modal translation tasks of speech to text, speech to speech, text to speech and text to text . For example, an English speech is automatically translated into local Chinese speech (such as Hokkien).

In addition, SeamlessM4T integrates translation models such as NLLB and MMS previously released by Meta, and uses 270,000 hours of speech and text alignment data. Therefore, this is also the largest and most comprehensive open source translation model currently available.

Paper: https://ai.meta.com/research/publications/seamless-m4t/

Online demo: https://seamless.metademolab.com/

huggingface demo: https://huggingface.co/spaces/facebook/seamless_m4t

Translation display

SeamlessM4T brief introduction

At present, most translation products can only translate regular voices, such as Chinese, French, German, English, etc., and have poor support for less popular languages.

SeamlessM4T has achieved a huge breakthrough at the technical level, supporting up to 100 voices and languages. At the same time, compared with a single translation product, it is better in terms of translation efficiency/quality and reduced latency, allowing people in different regions around the world to communicate in processes.

6482f0366c509b10b3227d0de5e46a28.png

Meta said that SeamlessM4T can realize multi-modal translation functions and is mainly composed of a variety of powerful translation models.

No Language Left Behind (NLLB): Meta released a translation model that supports 200 languages ​​on July 6, 2022. It has better support for some unpopular languages, and the average translation accuracy has increased by more than 70%. This model already provides translation services for Wikipedia.

31705cf060ad9d07bfc4f946a172b807.png

Universal Speech Translator: A speech-to-speech universal translator released by Meta on October 19, 2022, which can translate and recognize a variety of local spoken languages, such as Hokkien, breaking the communication barriers between different regions.

Massively Multilingual Speech (MMS): Meta’s ultra-large-scale speech and language AI model released on May 22 this year can recognize more than 4,000 spoken languages ​​and supports more than 1,100 text-to-speech, speech-to-text and speech synthesis.

It is not difficult to see from the above product introduction that Meta has integrated all its strongest AI translation models in a single field to form SeamlessM4T, the "Transformer" in the translation industry.

SeamlessM4T training data

SeamlessM4T can support so many voices and language translations, mainly due to high-quality training data sets, including speech to text, speech to speech, text to text, etc. However, human translation and transcribed voice and text data sources alone cannot satisfy 100 languages.

Therefore, Meta built a large-scale multilingual and modal text embedding space called SONAR for 200 languages. This method is much better than LASER3 or LaBSE in multi-language similarity search. The SONAR method is then simultaneously extended to speech modality, currently covering 36 languages.

bbcf72f49da9264357f3ed8fc67988d8.png

In addition, Meta obtained 443,000 hours of speech-to-text alignment data and created approximately 29,000 hours of speech-to-speech alignment data through data mining of public web data (tens of billions of sentences) and speech repositories (4 million hours) . SeamlessM4T is then pre-trained and fine-tuned.

Evaluation results

SeamlessM4T achieves state-of-the-art translation results in 100 languages, with multi-tasking support and text translation all in one model, including automatic speech recognition, speech-to-text, speech-to-speech, text-to-speech, and text-to-speech. Finish.

cc6e542156d229ea82fe48821609c424.png

To enable more accurate assessment without relying on text-based metrics, Meta has extended text-free metrics to BLASER 2.0, which can be assessed across speech and text units with similar accuracy to its predecessor.

When conducting robustness testing, SeamlessM4T performed better against background noise and speaker changes in the speech-to-text task, with average improvements of 37% and 48% respectively, compared to the current state-of-the-art translation models.

Meta also significantly improves the performance of supported low- and medium-resource languages, while maintaining strong performance for high-resource languages.

The material of this article comes from Meta official website. If there is any infringement, please contact us to delete it.

END

04f68b770d6842e54be7c4193a8dfd3a.png

Guess you like

Origin blog.csdn.net/fogdragon/article/details/132680421