Waitlisted for three weeks, and finally got the experience qualification of Baidu Wenxin Yiyan this afternoon, so I started the test immediately.
According to the information on Wen Xin Yi Yan's web page, the latest release is the version on April 1, and the version number is v1.0.3. It should be that two iterations have been made since the press conference on the 16th of last month. According to Wenxin Yiyan's own answer, we can see that Wenxin Yiyan is a large language model based on Baidu Flying Paddle and Wenxin Big Model.
Not much to say, the author conducted about 50 tests on Wenxin Yiyan in the afternoon, and compared 10 of the questions with the answers of GPT-4. Here are some screenshots for your reference.
Chinese poetry
With the theme of "late spring and early summer", write a Tibetan acrostic poem.
From the perspective of poetry, Wen Xin is obviously worse.
Chinese history and culture
In order to confirm whether Wenxin Yiyan is really better than GPT in Chinese, the author has done some tests in this regard.
Regarding the answer to the question of why Zhuge Liang’s Northern Expedition failed, although GPT-4’s performance is not bad and the answer is very comprehensive, in terms of the depth and quality of the answer, Wenxin Yiyan’s performance is even better. But when asking some unpopular or professional historical questions, both Wenxinyiyan and GPT-4 will make mistakes.
Regarding the introduction of the book "Eastern Jin Clan Politics", both GPT-4 and Wen Xin made factual mistakes. GPT-4 said that the author was Mr. Fan Wenlan, and also gave the wrong year of birth and death of Mr. Wen Xin; What's more, he gave the name of a literary writer. As for the answers to some classics and common-sense cultural questions, both can give unbiased answers.
code ability
The author also focused on the performance of Wenxin Yiyan in code. The code can focus on the reasoning ability of an LLM, so the author has also done a lot of tests on the code. On the whole, Wenxin Yiyan's code ability has improved a lot compared to last month's release, because the author has not actually tested and run the code generated by the two, so it is not possible to go deeper into the two for the time being evaluation of.
First use JS to simply write a bubble sort:
The codes of the two are basically the same. The difference is that GPT-4 gave test cases, and Wenxin only gave a brief explanation before ending the answer. Look at the performance in deep learning:
The screenshots are not complete, but the author compared them and found that there is not much difference in the code structure between the two, but GPT-4 has a more comprehensive interpretation of the code.
Finally, the author asked the two to role-play a server with four 3090 graphics cards, and output according to the instructions. In this regard, Wenxin Yiyan's performance is far inferior to GPT-4.
content query
In terms of content query, the author asked the two to query the lyrics of Jay Chou's "Blue and White Porcelain" respectively. Both Wenxin Yiyan and GPT-4 could give the correct lyrics, but GPT-3.5 was completely generating and did not perform the query function.
multimodal
Although GPT-4 has incredible multi-modal image input and generation capabilities, it is not yet possible to experience it. Fortunately, Wenxin directly provides image generation and AI drawing functions, so let's finally take a look at Wenxin's drawing level.
From the overall test, the performance of Wenxinyiyan is beyond the author's expectations. Although there is still a certain gap in reasoning ability from GPT-4, Baidu dares to release and benchmark ChatGPT on the domestic AI first, which is really commendable. From this point of view, the author hopes that domestic AI can catch up and make AI products that affect the world as soon as possible.
In addition, in order to gather more people to participate in AI productivity tools, the author specially set up a knowledge planet called [ ChatGPT Lab ] a few days ago. At present, 140+ readers have joined. The main positioning of the planet includes:
1. How to improve work and study efficiency based on ChatGPT.
2. Track the cutting-edge developments and latest progress of NLP, LLM, AIGC and AGI.
3. Share the latest application and gameplay of ChatGPT.