Tsinghua Global Model Report is released, Wenxin ranks first in Chinese, Chinese and Mathematics

Recently, the SuperBench large model comprehensive capability evaluation framework developed by Tsinghua University Basic Model Research Center and Zhongguancun Laboratory officially released the March 2024 version of the "SuperBench Large Model Comprehensive Capability Evaluation Report" . The evaluation included a total of 14 representative models at home and abroad . The results showed that Wenxinyiyan 4.0 performed well and was close to the level of international first-class models, and the gap has gradually narrowed . It is truly the leading domestic model .

For example, in the evaluation of human alignment ability , Wenxinyiyan 4.0 performed well and ranked first in the country. In the evaluation of Chinese reasoning and Chinese language, Wenxinyiyan was far ahead, with a clear gap between it and other models . Chinese understanding Wen Xin Yi Yan 4.0 has a clear lead, leading the second place GLM-4  by 0.41 points . The GPT-4 series models perform poorly, ranking in the middle and lower reaches, and are more than 0 points behind the first Wen Xin Yi Yan 4.0 points. 1 point .

In terms of mathematical ability in semantic understanding , Wenxinyiyan 4.0 and Claude-3 rank first in the world ;  the GPT-4 series models rank fourth and fifth , and the scores of other models are concentrated around 55 points , significantly behind the first echelon; In terms of reading comprehension ability in semantic understanding, Wenxinyiyan 4.0 surpassed GPT-4 Turbo, Claude-3 and GLM-4 to take the first place.

In terms of safety evaluation, which is most important to enterprises when choosing large models, the domestic model Wenxinyiyan 4.0 performed brilliantly, beating the world-class GPT-4 series models and Claude-3 to score the highest score (89.1 points). Claude- 3 ranks only fourth.

It is worth noting that Wen Xinyiyan is not only excellent in technical capabilities, but also leads the way in application implementation. Since Wen Xin Yi Yan was first launched on March 16 last year , the number of users has exceeded 200 million, and the number of daily API calls has also exceeded 200 million .

2023年「百模大战」,国产大模型厮杀猛烈谁是真正的领头羊?尽管国内外存在多个模型能力评测榜单,但它们的质量参差不齐,排名差异显著。我们在看榜单参考的时候一定要多看权威机构、权威高校的评测,为选择大模型提供科学研判

Linus took it upon himself to prevent kernel developers from replacing tabs with spaces. His father is one of the few leaders who can write code, his second son is the director of the open source technology department, and his youngest son is an open source core contributor. Robin Li: Natural language will become a new universal programming language. The open source model will fall further and further behind Huawei: It will take 1 year to fully migrate 5,000 commonly used mobile applications to Hongmeng. Java is the language most prone to third-party vulnerabilities. Rich text editor Quill 2.0 has been released with features, reliability and developers. The experience has been greatly improved. Ma Huateng and Zhou Hongyi shook hands to "eliminate grudges." Meta Llama 3 is officially released. Although the open source of Laoxiangji is not the code, the reasons behind it are very heart-warming. Google announced a large-scale restructuring
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/6852546/blog/11053975