Comparison test between Wenxin Yiyan and GPT-4!

Waitlisted for three weeks, and finally got the experience qualification of Baidu Wenxin Yiyan this afternoon, so I started the test immediately.

a2c07a2d85a61264715f60d51107da96.png

According to the information on Wen Xin Yi Yan's web page, the latest release is the version on April 1, and the version number is v1.0.3. It should be that two iterations have been made since the press conference on the 16th of last month. According to Wenxin Yiyan's own answer, we can see that Wenxin Yiyan is a large language model based on Baidu Flying Paddle and Wenxin Big Model.

88af98bcf90a4d1aa670fa2fdf5e4ad2.png

Not much to say, the author conducted about 50 tests on Wenxin Yiyan in the afternoon, and compared 10 of the questions with the answers of GPT-4. Here are some screenshots for your reference.

Chinese poetry

With the theme of "late spring and early summer", write a Tibetan acrostic poem.

baba17d9c3ec72573a8d8dee650daac4.png

5471ea69164fd040c2d42b8a1fff4b3c.png

From the perspective of poetry, Wen Xin is obviously worse.

Chinese history and culture

In order to confirm whether Wenxin Yiyan is really better than GPT in Chinese, the author has done some tests in this regard.

072ba12d0e55d461f7918daecf09e030.png

deec24dcf4b0bca6592f9f99c44099c1.png

Regarding the answer to the question of why Zhuge Liang’s Northern Expedition failed, although GPT-4’s performance is not bad and the answer is very comprehensive, in terms of the depth and quality of the answer, Wenxin Yiyan’s performance is even better. But when asking some unpopular or professional historical questions, both Wenxinyiyan and GPT-4 will make mistakes.

717fbc4da67cf53537a1e9c40607ad9f.png

d7ec9235323458f3a9e43e219fc5dab0.png

Regarding the introduction of the book "Eastern Jin Clan Politics", both GPT-4 and Wen Xin made factual mistakes. GPT-4 said that the author was Mr. Fan Wenlan, and also gave the wrong year of birth and death of Mr. Wen Xin; What's more, he gave the name of a literary writer. As for the answers to some classics and common-sense cultural questions, both can give unbiased answers.

eb771ed5c65a46b325690ee0cfb40ffe.png

c752dff2b64930bd9fa636623e235d1b.png

code ability

The author also focused on the performance of Wenxin Yiyan in code. The code can focus on the reasoning ability of an LLM, so the author has also done a lot of tests on the code. On the whole, Wenxin Yiyan's code ability has improved a lot compared to last month's release, because the author has not actually tested and run the code generated by the two, so it is not possible to go deeper into the two for the time being evaluation of.

First use JS to simply write a bubble sort:

7f2a8abb90d894c1555a21f5caa29d4a.png

96e334a99a9571390e5dd0d6dec0b689.png

The codes of the two are basically the same. The difference is that GPT-4 gave test cases, and Wenxin only gave a brief explanation before ending the answer. Look at the performance in deep learning:

7af3e0d7d6c52d973de7f95f951294ad.png

f6b6e62ce110c9d3f792199fe1bf87f1.png

The screenshots are not complete, but the author compared them and found that there is not much difference in the code structure between the two, but GPT-4 has a more comprehensive interpretation of the code.

Finally, the author asked the two to role-play a server with four 3090 graphics cards, and output according to the instructions. In this regard, Wenxin Yiyan's performance is far inferior to GPT-4.

6eb759e2449ff9aa566fd95aecfb95ab.png

a139a14b258ffe9cdcaf2701913b7d6c.png

content query

In terms of content query, the author asked the two to query the lyrics of Jay Chou's "Blue and White Porcelain" respectively. Both Wenxin Yiyan and GPT-4 could give the correct lyrics, but GPT-3.5 was completely generating and did not perform the query function.

2841f928e44676ca8abb49d0a2a387c8.png

multimodal

Although GPT-4 has incredible multi-modal image input and generation capabilities, it is not yet possible to experience it. Fortunately, Wenxin directly provides image generation and AI drawing functions, so let's finally take a look at Wenxin's drawing level.

ca20f1877d1a92d3ecbe2a94331a604b.png

b762137180958adde9edc017e002795f.png

0ec4662815874e4a552d571159dc8b5d.png

From the overall test, the performance of Wenxinyiyan is beyond the author's expectations. Although there is still a certain gap in reasoning ability from GPT-4, Baidu dares to release and benchmark ChatGPT on the domestic AI first, which is really commendable. From this point of view, the author hopes that domestic AI can catch up and make AI products that affect the world as soon as possible.


In addition, in order to gather more people to participate in AI productivity tools, the author specially set up a knowledge planet called [ ChatGPT Lab ] a few days ago. At present, 140+ readers have joined. The main positioning of the planet includes:

1. How to improve work and study efficiency based on ChatGPT.

2. Track the cutting-edge developments and latest progress of NLP, LLM, AIGC and AGI.

3. Share the latest application and gameplay of ChatGPT.

bed6f4eac523c7e7ec113c178870d719.jpeg

Guess you like

Origin blog.csdn.net/weixin_37737254/article/details/129980335