Wang Xiaochuan's big model debut! 7 billion parameters dominate the list, and Qingbei is the first to use it|Exclusive interview

3e95fcd1b3e79ced8eae750a42c07cf7.jpeg


  Xinzhiyuan Report  

Edit: so sleepy peach

Enter the NLP group —> join the NLP exchange group

[Introduction to Xinzhiyuan] Today, Baichuan Intelligent officially released a 7 billion parameter open source Chinese and English large model - baichuan-7B, which won the best results in multiple evaluation lists in one fell swoop.

After two months, the "Baichuan Intelligence" established by Wang Xiaochuan officially launched the first 7 billion parameter Chinese and English pre-training large model - baichuan-7B on June 15.

baichuan-7B not only surpasses other large models such as ChatGLM-6B by a significant advantage in the C-Eval, AGIEval and Gaokao Chinese authoritative evaluation lists, but also significantly leads LLaMA-7B in the MMLU English authoritative evaluation list.

At present, the baichuan-7B large model has been released on Hugging Face, Github and Model Scope platforms.

884f873594c571a9c55469f169044e9e.png

Hugging Face:https://huggingface.co/baichuan-inc/baichuan-7B

Github:https://github.com/baichuan-inc/baichuan-7B

Model Scope:https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary

Multiple Chinese evaluation benchmarks won the best 7B

In order to verify the various capabilities of the model, baichuan-7B has been comprehensively evaluated on the three most influential Chinese evaluation benchmarks, C-Eval, AGIEval and Gaokao, and all have achieved excellent results. It has become the Chinese performance model under the same parameter scale The best native pre-trained model.

a5f7e03c4af9efd2ccaaff76c2be7301.png

In the evaluation of AGIEval, the comprehensive score of baichuan-7B reached 34.4 points, far exceeding other open source models such as LLaMA-7B, Falcon-7B, Bloom-7B and ChatGLM-6B.

In the evaluation of Chinese C-EVAL, the comprehensive score of baichuan-7B reached 42.8 points, surpassing the 38.9 points of ChatGLM-6B, and even better than some models with larger parameter scales.

In the Gaokao evaluation, the comprehensive score of baichuan-7B reached 36.2 points, which is significantly ahead of various other pre-training models with the same parameter scale.

5e3b79f944523ac3b28056d4e7661057.png

https://cevalbenchmark.com/static/leaderboard_zh.html(2023-06-15)

The AGIEval evaluation benchmark was initiated by Microsoft Research to comprehensively evaluate the ability of the basic model in human cognition and problem-solving related tasks, including China's college entrance examination, judicial examination, and 20 American SAT, LSAT, GRE and GMAT. Public and rigorous official entrance and vocational qualification examinations.

The C-Eval evaluation benchmark was jointly created by Shanghai Jiaotong University, Tsinghua University and the University of Edinburgh. It is a comprehensive test evaluation set for Chinese language models, covering 52 subjects from different industries.

The Gaokao evaluation benchmark is an evaluation framework created by the research team of Fudan University. It uses Chinese college entrance examination questions as a data set to test the performance of large models in Chinese language understanding and logical reasoning.

MMLU benchmark outperforms LLaMA-7B by a wide margin

baichuan-7B not only excels in Chinese, but also in English.

In the evaluation of MMLU, the comprehensive score of baichuan-7B is as high as 42.5 points, which is far ahead of the 34.2 points of the English open source pre-training model LLaMA-7B and the 36.9 points of the Chinese open source model ChatGLM-6B.

9cd0a1cb75afcc1c15244be9d1fabf40.png

MMLU is jointly created by well-known universities such as the University of California, Berkeley. It integrates 57 subjects in the fields of science, engineering, mathematics, humanities, and social sciences. The main goal is to conduct an in-depth test of the model's English interdisciplinary professional ability. It covers a wide range of content, from beginner level all the way to advanced professional level.

Trillions of data, 4K context, efficient and stable training

Training corpus is crucial to the training results of large models. In terms of constructing the pre-training corpus, Baichuan Intelligence is based on high-quality Chinese corpus and incorporates high-quality English data.

Specifically, the original data includes a large amount of Chinese and English Internet data captured by ourselves, some open source Chinese and English data, and a large amount of high-quality knowledge data.

04fa5cea56f93ad35c545475f8293615.png

In terms of data quality, the data is scored through the quality model, and the original data set is accurately screened at the chapter level and the sentence level.

In terms of content diversity, the self-developed ultra-large-scale locally sensitive hash clustering system and semantic clustering system were used to cluster the data at multiple levels and at multiple granularities, and finally constructed a quality- and diversity-contained 1.2 trillion token pre-training data.

Compared with other open source Chinese pre-training models with the same parameter scale, the amount of data has increased by more than 50%.

On the basis of trillions of high-quality Chinese and English data, in order to better improve training efficiency, baichuan-7B deeply integrates model operators to speed up the calculation process, and adaptively optimizes the model parallel strategy and heavy weight for task load and cluster configuration. Calculation strategy.

Through the efficient training process scheduling communication, baichuan-7B successfully realizes the efficient overlapping of computing and communication, and then achieves super-linear training acceleration, and the training throughput on the kilocalorie cluster reaches an industry-leading level of 180+Tflops.

At the same time, the window length of existing open source models is within 2K. For some long text modeling tasks, such as the scene where external knowledge needs to be introduced for search enhancement, a longer processing length will help the model capture more information during the training and inference stages. For context information, there are relatively large constraints on the processing length of 2K.

ae51e6edf306ee0fb5d5a3c3e326a2c0.png

Optimized Word Segmentation Algorithm

Based on efficient attention operator optimization, baichuan-7B realizes the expansion capability of 10,000-level ultra-long dynamic windows. This open source pre-training model opens up 4K context windows, making the model application scenarios more extensive.

In addition, baichuan-7B has also deeply optimized the model training process, adopting a more scientific and stable training process and hyperparameter selection, which greatly improves the convergence speed of the baichuan-7B model.

Compared with models of the same parameter size, baichuan-7B performs better on key performance indicators such as perplexity (PPL) and training loss.

019838a0354f9c46f4ed91ad44264362.png

Open source, free and commercially available, Qingbei has already experienced it first

Adhering to the spirit of open source, the baichuan-7B code adopts the Apache-2.0 protocol, and the model weight adopts the free commercial agreement, which can be used for free commercial use with simple registration.

The content of baichuan-7B's open source is very rich, including reasoning code, INT4 quantization implementation, fine-tuning code, and the weight of the pre-trained model.

Among them, the fine-tuning code is convenient for users to adjust and optimize the model; the reasoning code and INT4 quantization implementation help developers to deploy and apply the model at low cost; after the pre-training model weights are open sourced, users can directly use the pre-training model Conduct various experimental studies.

It is understood that two top universities, Peking University and Tsinghua University, have taken the lead in using the baichuan-7B model to promote related research work, and plan to cooperate with baichuan intelligent in the future to jointly promote the application and development of the baichuan-7B model.

Liu Yiqun, dean of Tsinghua University's Internet Judicial Research Institute and a professor of the Department of Computer Science, believes that the baichuan-7B model performs very well in Chinese. Its free and commercial open source method shows an open attitude, which not only contributes to the community, but also promotes technological development. The team plans to carry out related research in the field of judicial artificial intelligence based on the baichuan-7B model.

Yang Yaodong, an assistant professor at the Institute of Artificial Intelligence of Peking University, believes that the open source of the baichuan-7B model will play an important role in promoting the ecological construction and academic research of the Chinese basic language model. Further in-depth research on the security and alignment of the model.

Wang Xiaochuan, CEO of Baichuan Intelligent, said: "The release of this open source model is the first milestone of Baichuan Intelligent two months after its establishment, and it is a good start for Baichuan Intelligent. It has contributed to the cause and contributed new strength to the world's large model open source community."

Interview with technical team

Q: How does baichuan-7B deal with the hallucination problem, and how to improve the accuracy of the results in the future?

A: Large models cannot completely solve the problem of hallucinations in the foreseeable future. On the one hand, reinforcement learning is used to let the model know that it does not know, effectively alleviating hallucinations. More importantly, it is necessary to rely on "search enhancement" to introduce external knowledge to gradually solve the problem of hallucinations. .

Q: What commercial value can baichuan-7B bring?

A: baichuan-7B, as the 7B open-source commercial large-scale model with the best performance in the evaluation of multiple lists, fills the gap in the market that lacks high-quality 7B large-scale models optimized for Chinese, and is an ideal 7B large-scale model base for developers . At the same time, in terms of commercial value, it can bring huge value to users in areas such as text generation, automated writing, data analysis, knowledge quiz, Chinese-English translation, personalized interaction and personal assistants in professional fields, such as the medical field.

Q: Has the performance of baichuan-7B in the evaluation list reached the initial training model expectations?

A: Ranking is not our goal. We believe that with good data and algorithm capabilities, the evaluation will naturally have good results. The excellent performance of baichuan-7B in many of the most influential evaluations this time has also verified Baichuan’s the philosophy.

References:

https://github.com/baichuan-inc/baichuan-7B


Enter the NLP group —> join the NLP exchange group

Guess you like

Origin blog.csdn.net/qq_27590277/article/details/131238445