Event Registration|How to use a budget of 700,000 to train a 100 billion language model from scratch

9c32bb12ce25beef466924445778e17d.png

2d09f1e20d1309797f40e3adf6d1546f.jpeg

Wang Yequan

Head of the cognitive model team of Beijing Zhiyuan Artificial Intelligence Research Institute, Ph.D. from Tsinghua University, member of the Affective Computing Committee of the Chinese Information Society of China, and was named the world's most influential artificial intelligence scholar in AI 2000 in 2022 (in the field of natural language processing) . Mainly engaged in research work on large language models and natural language processing. Representative achievements include FLM-101B, FreeLM, Mu-Scaling, MSG and ATAE-LSTM, etc.

He has published multiple research results at top international conferences and has been cited more than 2,500 times by Google Scholar. The research results ATAE-LSTM and RNN-Capsule were rated as the most influential papers by PAPER DIGEST, and have been selected into the Google Scholar publication index list many times.

How to use a budget of 700,000 to train a large language model of 100 billion from scratch

Large language models represented by the GPT series have achieved significant success, but their high cost limits the further rapid development of large models. At the same time, this also brings new opportunities and challenges to academia and industry. In order to further reduce the model cost, we adopted a growth strategy and successfully reduced the cost of a 100 billion dense large model to 700,000.

In addition, in order to evaluate the large model more comprehensively and reasonably, based on the existing knowledge-based assessment and drawing on the concept of IQ testing, an IQ testing plan for the large model was proposed. Experiments show that 700,000 successfully trained 100 billion large models show very good capabilities. We believe that growth strategies can bring new possibilities to break through the single dense trillion model.

Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks. However, their high costs constrain the further development of LLMs, which also brings both opportunities and challenges for academia and industry. To break down this barrier, FLM-101B employs a growth strategy and successfully lowers the cost of training a 100B-level dense model down to ¥700,000 CNY. Additionally, in order to evaluate LLMs systematically and more rationally, besides existing knowledge-based assessments, the IQ test in LLMs, whose concept is partially borrowed from psychology, is proposed. Experimental results show that the model trained with a budget of ¥700K, achieves comparable performance to powerful and well-known models and demonstrates impressive capabilities. We believe that the growth strategy offers new possibilities for breakthroughs in training 1T+ dense models.

Event time : September 21 (Thursday) 14:30-15:30

Event format: Online live broadcast, scan the QR code below to register

1c7d0bc980591ccd137259a25bd3afa4.png

Click to read the original text and communicate with the speaker online

Supongo que te gusta

Origin blog.csdn.net/BAAIBeijing/article/details/133054140
Recomendado
Clasificación