Baichuan open source large model Baichuan-13B evaluation

Following the release of the 7B pre-training base model in June , the Baichuan Intelligent team recently released the latest 13B model, including the pre-training base model Baichuan-13B-Base and the chat alignment model Baichuan-13B-Chat, and supports commercial use.

So far, the Chinese community has released a large number of open source models, mainly between 6B-13B.

So how does the open source model of Baichuan perform compared to other representative models at home and abroad, such as how far is it from ChatGPT3. , such as generation and creation, logical reasoning, code generation, how is the performance?

Chinese language comprehension evaluation benchmark The open-source community CLUE is based on the SuperCLUE-Open evaluation benchmark , that is, the Baichuan-13B-Chat is evaluated with 1200 questions in combination with open-ended questions and multiple rounds of dialogue ability tests.

The evaluation results are as follows:

in conclusion

1. Is it currently the best model with tens of billions of parameters in Chinese?

At present, it is believed that for open source models of the same magnitude, Baichuan-13B-Chat is the best open source model in the SuperCLUE open multi-round evaluation.

2. Is it close to ChatGPT3.5?

Compared with ChatGPT3.5, in the common tasks of SuperCLUE open multi-round evaluation, such as generation and creation, role-playing, context dialogue, knowledge and encyclopedia, the effect is close to that of ChatGPT3.5 and Claude basic version (see Quantitative analysis), but there is still a lot of room for improvement in complex tasks, such as code generation, mathematical calculations, logic and reasoning.

The following is an evaluation analysis of the model from a quantitative perspective.

quantitative analysis

  • SuperCLUE-Open (open multi-round evaluation):

  • Top Ten Capabilities of SuperCLUE-Open (Open Multi-round Evaluation): Taking Baichuan-13B-Chat as an Example

It can be seen that among the top ten capability evaluations of the SuperCLUE open multi-round evaluation benchmark, the Baichuan open source large model Baichuan-13B has a good performance in multiple capabilities (indicated by the win rate), and some tasks have relatively large room for improvement.

Click here for details .

Guess you like

Origin www.oschina.net/news/249838