Microsoft's biggest ever share-based language schema generation model Transformer

Lead: Nvidia's "Megatron" now only the second place.

Microsoft's biggest ever share-based language schema generation model Transformer

Microsoft AI & Research today shared the largest ever generation model based on Turing NLG Transformer architecture language (hereinafter referred to as T-NLG), and the open source library called deep learning DeepSpeed ​​to simplify distributed training for large models.

Transformer-based architecture means that the model can generate word to complete the task Open Text. In addition to completing unfinished sentences, it also can be generated directly answer questions on the input document and summary.

In August last year, NVIDIA has announced the world's largest language training model Transformer-based, then the model parameters using 8.3 billion, 24 times more than BERT big, five times bigger than OpenAI of GPT-2.

And the Microsoft shared model, T-NLG parameters for 17 billion, is Nvidia's Megatron (now the second largest Transformer model) twice, which is ten times OpenAI parameters of GPT-2's. Microsoft said the outstanding performance, T-NLG on a variety of language modeling benchmarks are better than the latest technology, and used in many practical tasks (including the summary and questions) time.

Microsoft's biggest ever share-based language schema generation model Transformer

However, like Google's, like Meena, initially using GPT-2, T-NLG initially only share a private presentation.

Microsoft Application AI research scientist Corby Rosset wrote in a blog post: "In addition to saving you time by summarizing documents and emails outside, T-NLG can also provide writing help writers, and readers may answer questions raised by the document thereby to enhance the Microsoft Office suite experience. " 

Language generation model Transformer architecture can predict the next word. They can be used to write the story, in order to generate a complete sentence answer and a summary text.

Microsoft said their goal is to be able to make in any case like humans directly, accurate and smooth response: In the past, questions and summary of existing systems rely on extract content from a document, the content can be used as an alternate answer or summary, but they usually seem unnatural or inconsistent. The use of such natural language T-NLG generation model, can naturally summary or answer questions about personal documents or e-mail themes.

Experts from the field of AI, told VentureBeat, 2019 years is the NLP model seminal year - using Transformer architecture is undoubtedly one of 2019's biggest trends in machine learning, which leads to progress in the field of language production and GLUE benchmark leader, Facebook's RoBERTa, Google and Microsoft's XLNet MT-DNN have joined to compete for the top spot among all types of benchmarks.

Also in today, Microsoft also open source library called a deep learning of DeepSpeed. The learning library is optimized for developers to provide low-latency, high-throughput reasoning.

DeepSpeed ​​contains zero redundancy Optimizer (ZeRO), for large-scale model train has 100 million or more parameters, Microsoft had used it in the past training T-NLG.

Microsoft said, DeepSpeed ​​ZeRO and enable them to reduce the degree of parallelism model (reduced from 16 to 4), a fourfold increase in the batch size for each node, and the training time reduced by two thirds; DeepSpeed ​​use less GPU you can make training more efficient large-scale models.

Developers and machine learning practitioners can use DeepSpeed ​​and ZeRO, because training large networks (such as the use of Transformer infrastructure network) can be expensive, and you may experience a large-scale problem.

In addition, Google's DeepMind today also released a new remote memory model Compressive Transformer, and one for the books new benchmark level language modeling PG19.

Published 439 original articles · won praise 684 · Views 1.41 million +

Guess you like

Origin blog.csdn.net/weixin_42137700/article/details/104258595