Meta open source AI language model, Wikipedia says it is good

Facebook is the world's largest social platform, connecting users all over the world. In order to make users not restricted by geographical, language and other conditions, Meta, the parent company of Facebook, recently announced that its NLLB (No Language Left Behind) project has made a breakthrough, which can develop high-quality machine translation for most languages ​​in the world.

The AI ​​model, called NLLB-200, can translate over 200 different languages. To evaluate the output quality of the new model, Meta created a test dataset consisting of 3001 sentence pairs for each language covered by the model, each translated from English to the target language by professional translators and native speakers.

The researchers ran these sentences through their model and compared the machine translation to a human-translated reference sentence using the BLEU benchmark commonly used in machine translation. Tests show that the new NLLB-200 model achieves an average 44% improvement in BLEU scores in supported languages, and even a 70% improvement in tests for some African and Indian dialects.

There are currently thousands of different languages ​​in the world, but due to the lack of language data, today's translation technology still has many shortcomings. Take the well-known Google Translate as an example, the number of languages ​​it can translate is currently limited to 133; while Microsoft's Bing Translate supports fewer languages ​​than Google Translate.

Although these translation tools support only more than 100 languages, and more than half of the world's population use more than a dozen or 20 languages, translation tools can meet the needs of most users. It is very unfriendly to users of low-resource languages ​​(especially in Africa), resulting in hindered communication between users of these languages ​​and the content they wish to consume.

Mark Zuckerberg said:

We just open sourced an AI model we built that can translate 200 different languages, many of which are not currently supported by other translation systems. We call this project "No Language Left Behind" and we use AI modeling techniques to build high-quality translations for languages ​​spoken by billions of people around the world.

Despite the technological breakthrough, Meta believes that achieving the goals of the NLLB project would not be possible without innovative collaboration. To enable other researchers to expand the language and build more inclusive technologies, Meta has open sourced the NLLB-200 model, while also providing grants of up to $200,000 to nonprofits to apply the NLLB-200 to their business.

The Wikimedia Foundation has now introduced the technology behind the NLLB-200 model in the Content Translation tool, which allows Wikipedia editors to more efficiently translate and edit articles from other underrepresented languages, which helps To enable Wikipedia readers around the world to acquire more knowledge in more languages.

NLLB-200 technical demonstration address: https://nllb.metademolab.com/

Project address: https://github.com/facebookresearch/fairseq/tree/nllb/

おすすめ

転載: www.oschina.net/news/202251/meta-open-source-nllb-200
おすすめ