According to Tsinghua's latest report evaluation, Wenxin Large Model 3.5 is firmly ranked first in China

Recently, the Shenyang team of the School of Journalism and Communication of Tsinghua University released the "Comprehensive Performance Evaluation Report of Large Language Models" (hereinafter referred to as the "Report"). The report shows that Baidu Wenxin Yiyan ranks first in China in terms of comprehensive scores among 20 indicators in three dimensions, surpassing ChatGPT, which ranks first in Chinese semantic understanding, and some Chinese abilities surpass GPT-4.

Shen Yang, a professor and doctoral supervisor at the School of Journalism and Communication of Tsinghua University, said: "In March this year, Baidu took the lead in releasing the big language model Wenxin Yiyan among the world's largest technology companies, allowing China to participate in the world's cutting-edge technology competition for the first time. In this evaluation, we have also seen the progress of Wenxin Yiyan in all aspects, especially in the aspect of Chinese semantic understanding, which is amazing. The rapid development of domestic large-scale models makes the technology more promising."

It is understood that the evaluation of the report selected 7 major language models: GPT-4, ChatGPT 3.5, Wenxin Yiyan, Tongyi Qianwen, Xunfei Xinghuo, Claude, and Tiangong, focusing on the quality of generation, use and performance, security and The three dimensions of compliance comprehensively examine 20 indicators including big language model context understanding, Chinese semantic understanding, misleading information identification, logical reasoning, content security, and privacy protection. On the whole, Wenxinyiyan has outstanding semantic comprehension ability, especially better Chinese comprehension ability, better understanding of Chinese culture, strong timeliness, and subtle grasp of content security, which is due to its enhanced knowledge, enhanced retrieval, and enhanced dialogue technological innovation .

In terms of generation quality, based on the comprehensive evaluation of semantic understanding, output expression, and adaptive generalization, Wenxinyiyan scored 76.98%, second only to GPT-4, and far ahead of other large language models including ChatGPT. Among them, in terms of some Chinese semantic understanding, Wenxin Yiyan ranked first with a score rate of 92%, surpassing Xunfei Xinghuo and GPT-4. With the core feature of knowledge enhancement, Wenxin Yiyan has a more accurate grasp of the characteristics of local languages. At the same time, because the training corpus contains a large number of local texts, it has a deeper understanding of local culture and can better handle themes and backgrounds related to local culture. Such as poetry, dialect, etc., have stronger domestic landing space.

In terms of security compliance, based on the comprehensive evaluation of content security, bias and fairness, and privacy protection, Wenxin Yiyan scored 78.18%, ranking first with GPT-4, far exceeding other large language models. The report shows that Wenxinyiyan has good content security and pays attention to user privacy protection and copyright protection.

It is understood that Baidu has a comprehensive layout of the "chip-framework-model-application" artificial intelligence four-layer technology stack. Its self-developed deep learning platform Flying Paddle strongly supports the efficient training and reasoning of the Wenxin large model. So far, Flying Paddle has condensed 7.5 million developers. Flying paddle and Wenxin collaborative optimization, the latest version of Wenxin large model 3.5 has realized the basic model upgrade, fine-tuning technology innovation, knowledge point enhancement, logical reasoning enhancement, etc., the model effect has been improved by 50%, the training speed has been increased by 2 times, and the reasoning speed has been increased 30 times.

At present, it has become the general trend to promote the application of large-scale models in the industry. Baidu Wenxin large-scale model has previously cooperated with State Grid, Shanghai Pudong Development Bank, Taikang, Geely and other enterprise units to jointly release 11 large-scale industry models. At present, the Wenxin large model has the largest industrial application scale in China. 150,000 enterprises have applied for access to the Wenxin Yiyan test, and quite good test results have been achieved in more than 400 scenarios.

Guess you like

Origin blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/132151550
Recommended