Baichuan Intelligent Open Source's latest large commercial model! Wang Xiaochuan: More popular than LLaMA, next shot is ChatGPT

Hengyu is sent from the concave non-si
qubit | public account QbitAI

We now have access to open source models that are friendlier and more capable than LLaMA.

The person who expressed the meaning of "far ahead" at the press conference this time was Wang Xiaochuan, CEO of Baichuan Intelligence.

Maintaining the frequency of releasing a new large model once a month, Baichuan Intelligent's latest open source fine-tuned Baichuan2-7B is free for commercial use.

Wang Xiaochuan said that Baichuan2-7B with 7 billion parameters is equivalent to LLaMA2 with 13 billion parameters in mainstream English task scores on English evaluation benchmarks such as MMLU.

916f35b31993d4e809eef4b63c776099.png

Also open sourced are Baichuan2-13B, Baichuan 2-13B-Chat and their 4-bit quantized versions, as well as the Check Point for the entire process of model training from 220B to 2640B.

At the same time, a Baichuan2 technical report detailing the training details was released, aiming to let the outside world understand its training process and "better promote the technical development of large model academic research and the community."

db09e382ee0b226eb30548ea4bc4eec9.png

Baichuan2 series large models, open source

The two open source large models of the Baichuan2 series are Baichuan2-7B with 7 billion parameters and Baichuan2-13B with 13 billion parameters.

Its data is taken from trillions of Internet data and vertical industries, and the training token size is 2.6TB.

It is reported that the data processing of the Baichuan2 series of large models draws on a lot of experience used in search.

On the one hand, ultra-large-scale content passes through the clustering system to achieve "hundreds of billions of data cleaning and deduplication work at the hour level"; in addition, multi-granularity content quality scoring is performed during most data cleaning, supporting fine-grained sampling, thereby improving the model Quality (especially in the Chinese domain).

Both in the series support dozens of languages ​​such as Chinese, English, Spanish, and French, and are mainly used in academic research, the Internet, and finance.

Compared with the previous generation, Baichuan2 has improved its mathematical capabilities by 49%, coding capabilities by 46%, security capabilities by 37%, logic capabilities by 25%, semantic understanding capabilities by 15%, and both liberal arts and science capabilities.

Baichuan has also optimized the infra layer, making it possible to achieve 180TFLOPS training performance in the Kcal A800 cluster, making the machine utilization rate exceed 50%.

50767de1c4defa203db9b2c31ac70c26.png

Wang Xiaochuan said at the scene that in terms of model parameters and structure settings, the Baichuan large model is as close as possible to the LLaMA series.

The greatest significance of this is to allow community users to directly switch from LLaMA to Baichuan's model. At the same time, we are compatible with as many community ecology as possible.

In addition to a larger model, Baichuan Intelligent also announced the intermediate process of model training of 300 billion to 2.6 trillion tokens.

In other words, Baichuan develops the ability of tokens of different sizes like slicing, "making it easier for everyone to understand pre-training, or to fine-tune and strengthen pre-training."

This is also the first time in China that a company has opened up such a training process.

It is worth mentioning that the large models of the Baichuan series have opened a green channel for academic teachers and students, and more information can be obtained when applying to help academics.

"Super App" is expected to be launched in the first quarter of next year

Since its establishment, Baichuan Intelligent has maintained the speed of monthly larger models, showing the situation of alternate releases of open and closed sources.

After Baichuan-7B and Baichuan-13B were previously open sourced, the number of downloads of Hugging Face exceeded one million in the first week, and the total number of downloads was 5 million. It is the most downloaded open source large model in the world, and the number of companies that have applied for trial deployment exceeds 200.

In terms of closed-source models, there is Baichuan-53B released last month, which integrates large models and search to a "very high degree".

Why "open source + closed source" in parallel?

"In the last month of the second quarter, we believed that the demand at that time, and where we could contribute, was the open source model." Wang Xiaochuan explained on the spot, "So after establishing the company, we released the open source model while taking into account the development of large closed source models. train."

dbd3d5becb7328536c24a3121874c5da.png

So far, the total number of large-scale models released in China exceeds one hundred.

Not only training models, but also the step of "landing": a week ago, the first batch of 11 domestic AI large models were also opened to the public.

However, as Zhang Bo, academician of the Chinese Academy of Sciences and honorary dean of the Institute of Artificial Intelligence of Tsinghua University, mentioned in his speech at the press conference, the large models on the market are “mainly focused on applications in vertical fields” rather than “the academic study of large models”. research itself".

However, this work is both urgent and important.

Until now, the world is completely confused about the theoretical working principles of large models and the phenomena they produce. All conclusions have been deduced to produce "emergent phenomena."

The so-called emergence is to give yourself a way out when the explanation is not clear. I think this issue must be clarified.

The prosperity of the large-scale model track itself, and the benefits of open source to boost innovation and R&D efficiency are all conducive to a thorough understanding of GPT.

612cc2a316809d0565d82f57934c42f2.jpeg

Determine the parallel release model of open and closed sources, and intensively release phased results to the outside world. At present, the App has been opened to the outside world. What is the next plan for Baichuan Intelligence?

In the fourth quarter of this year, a large model with 100 billion parameters is expected to be released.

Around the first quarter of next year, a “super application” is expected to be launched.

These two time nodes have also been flagged by many large model manufacturers and start-up companies. Feeling the time, on the user side, we have more expectations (hao)waiting (xi)~

—Contact   the author— _

ff8856886820a17c1a1401fb20f97f25.png

Guess you like

Origin blog.csdn.net/QbitAI/article/details/132748626