Lead: Nvidia's "Megatron" now only the second place.
Microsoft AI & Research today shared the largest ever generation model based on Turing NLG Transformer architecture language (hereinafter referred to as T-NLG), and the open source library called deep learning DeepSpeed to simplify distributed training for large models.
Transformer-based architecture means that the model can generate word to complete the task Open Text. In addition to completing unfinished sentences, it also can be generated directly answer questions on the input document and summary.
In August last year, NVIDIA has announced the world's largest language training model Transformer-based, then the model parameters using 8.3 billion, 24 times more than BERT big, five times bigger than OpenAI of GPT-2.
And the Microsoft shared model, T-NLG parameters for 17 billion, is Nvidia's Megatron (now the second largest Transformer model) twice, which is ten times OpenAI parameters of GPT-2's. Microsoft said the outstanding performance, T-NLG on a variety of language modeling benchmarks are better than the latest technology, and used in many practical tasks (including the summary and questions) time.
However, like Google's, like Meena, initially using GPT-2, T-NLG initially only share a private presentation.
Microsoft Application AI research scientist Corby Rosset wrote in a blog post: "In addition to saving you time by summarizing documents and emails outside, T-NLG can also provide writing help writers, and readers may answer questions raised by the document thereby to enhance the Microsoft Office suite experience. "
Language generation model Transformer architecture can predict the next word. They can be used to write the story, in order to generate a complete sentence answer and a summary text.
Microsoft said their goal is to be able to make in any case like humans directly, accurate and smooth response: In the past, questions and summary of existing systems rely on extract content from a document, the content can be used as an alternate answer or summary, but they usually seem unnatural or inconsistent. The use of such natural language T-NLG generation model, can naturally summary or answer questions about personal documents or e-mail themes.
Experts from the field of AI, told VentureBeat, 2019 years is the NLP model seminal year - using Transformer architecture is undoubtedly one of 2019's biggest trends in machine learning, which leads to progress in the field of language production and GLUE benchmark leader, Facebook's RoBERTa, Google and Microsoft's XLNet MT-DNN have joined to compete for the top spot among all types of benchmarks.
Also in today, Microsoft also open source library called a deep learning of DeepSpeed. The learning library is optimized for developers to provide low-latency, high-throughput reasoning.
DeepSpeed contains zero redundancy Optimizer (ZeRO), for large-scale model train has 100 million or more parameters, Microsoft had used it in the past training T-NLG.
Microsoft said, DeepSpeed ZeRO and enable them to reduce the degree of parallelism model (reduced from 16 to 4), a fourfold increase in the batch size for each node, and the training time reduced by two thirds; DeepSpeed use less GPU you can make training more efficient large-scale models.
Developers and machine learning practitioners can use DeepSpeed and ZeRO, because training large networks (such as the use of Transformer infrastructure network) can be expensive, and you may experience a large-scale problem.
In addition, Google's DeepMind today also released a new remote memory model Compressive Transformer, and one for the books new benchmark level language modeling PG19.