Tsinghua Global Model Report is released, Wenxin ranks first in Chinese, Chinese and Mathematics

[Live broadcast preview] Will large models replace programmers? "

Recently, the SuperBench large model comprehensive capability evaluation framework developed by Tsinghua University Basic Model Research Center and Zhongguancun Laboratory officially released the March 2024 version of the "SuperBench Large Model Comprehensive Capability Evaluation Report" . The evaluation included a total of 14 representative models at home and abroad . The results showed that Wenxinyiyan 4.0 performed well and was close to the level of international first-class models, and the gap has gradually narrowed . It is truly the leading domestic model .

For example, in the evaluation of human alignment ability , Wenxinyiyan 4.0 performed well and ranked first in the country. In the evaluation of Chinese reasoning and Chinese language, Wenxinyiyan was far ahead, with a clear gap between it and other models . Chinese understanding Wen Xin Yi Yan 4.0 has a clear lead, leading the second place GLM-4 by 0.41 points . The GPT-4 series models perform poorly, ranking in the middle and lower reaches, and are more than 0 points behind the first Wen Xin Yi Yan 4.0 points. 1 point .

In terms of mathematical ability in semantic understanding , Wenxinyiyan 4.0 and Claude-3 rank first in the world ; the GPT-4 series models rank fourth and fifth , and the scores of other models are concentrated around 55 points , significantly behind the first echelon; In terms of reading comprehension ability in semantic understanding, Wenxinyiyan 4.0 surpassed GPT-4 Turbo, Claude-3 and GLM-4 to take the first place.

In terms of safety evaluation, which is most important to enterprises when choosing large models, the domestic model Wenxinyiyan 4.0 performed brilliantly, beating the world-class GPT-4 series models and Claude-3 to score the highest score (89.1 points). Claude- 3 ranks only fourth.

It is worth noting that Wen Xinyiyan is not only excellent in technical capabilities, but also leads the way in application implementation. Since Wen Xin Yi Yan was first launched on March 16 last year , the number of users has exceeded 200 million, and the number of daily API calls has also exceeded 200 million .

2023年「百模大战」，国产大模型厮杀猛烈，谁是真正的领头羊？尽管国内外存在多个模型能力评测榜单，但它们的质量参差不齐，排名差异显著。我们在看榜单参考的时候一定要多看权威机构、权威高校的评测，为选择大模型提供科学研判。

Tsinghua Global Model Report is released, Wenxin ranks first in Chinese, Chinese and Mathematics

Guess you like