Comprehensive comparative test between Wenxin Yiyan and Xunfei Xinghuo: (1) Language comprehension ability

Table of contents

01 Preface

02 Test content design

03 Complex semantic test

04 Emotion recognition of articles

05 Summary of the article

06 Article element extraction

07 Summary


01 Preface

        Wenxin Yiyan and Xunfei Xinghuo are the two major language models that are currently undergoing official public beta testing in China. Many companies claim to have released their own products, but there has been no large-scale public testing. These two products are currently relatively easy to obtain. Wenxin applied after the announcement, and waited for about a week to receive the confirmation message, while Xunfei Xinghuo received the confirmation message the next day.

        In order to provide more comparison results, I give the results of the ChatGPT3.5-turbo version of a domestic shell, because I am not sure about its authenticity, only for comparison. Friends with resources are welcome to help test the results of chatGPT3.5 and GPT4.

        Why is it a case, because: [Network Security] It is illegal to "overpass" the Internet!

(1) A word from the heart

        Wenxin Yiyan (English name: ERNIE Bot) is Baidu's new generation of knowledge-enhanced large-scale language model, a new member of the Wenxin large-scale model family. It can interact with people, answer questions, assist in creation, and help people obtain information efficiently and conveniently. knowledge and inspiration. Wenxin Yiyan is a large-scale language model for knowledge enhancement. Based on the Flying Paddle deep learning platform and Wenxin knowledge-enhanced large-scale model, it continuously integrates and learns from massive data and large-scale knowledge, and has the technical characteristics of knowledge enhancement, retrieval enhancement, and dialogue enhancement.

On March 20, 2023, Baidu's official WeChat news Wenxin Yiyan cloud service will be launched on March 27.

Wenxin Yiyan press conference (pictures from network intrusion and deletion)

(2) Xunfei Spark

        Xunfei Xinghuo is a new generation of cognitive intelligence model launched by iFLYTEK, which has cross-domain knowledge and language understanding ability, and can understand and execute tasks based on natural dialogue. Continue to evolve from massive data and large-scale knowledge, and realize the closed loop of the whole process from proposal, planning to problem solving.

        On May 6, 2023, iFLYTEK officially released the Spark Cognitive Big Model, which has seven core capabilities, namely text generation, language understanding, knowledge question and answer, logical reasoning, mathematical ability, coding ability, and multimodal ability.

Xunfei Spark function introduction (picture from network intrusion and deletion)

(3) Test content

        Comparative testing of a large language model is a relatively complicated matter. For this test, I refer to the method in the article "AI Chat Robot Battle ChatGPT / GPT-4 / Wenxin Yiyan Performance Comparison Test", which mainly includes six Aspects:

        1. Language understanding

        2. Task completed

        3. Common sense questions

        4. Logical Mathematics

        5. Code ability

        6. Professional fields

        The content of these six aspects is basically the same as the capabilities introduced when the two products were released, covering the common application scenarios of AI chatbots. AnTuTu also has a scoring mechanism. I will not score here and present the results to you. Readers, trust your judgment.

        The first aspect of the test starts today, and it will be published gradually according to the progress of the test. Click to follow, don't miss it.

02 Test content design

        Language understanding can be the basis of large language models. Whether the user's input can be well understood determines whether the follow-up work can be completed. According to the test content of AnTuTu, it can be divided into the following main directions:

        1. Writing generation: Given a short request, generate a certain amount of text.

        2. Reading Comprehension: Answer questions based on a given text.

        3. Comprehension of complex semantics: puns, rhetoric, Chinese word segmentation, emotions, riddles, etc.

        4. Summary generation: If you provide a certain length, let the engine generate a summary.

        5. Information extraction: key information extraction in complex text.

        6. Multi-round comprehension ability: About 3-10 rounds of dialogue, the topic of the dialogue content jumps, and the questions do not examine too complicated reasoning and common sense.

        Here I have made a slight adjustment, put the writing production into the subsequent task completion for testing, and will integrate several of the projects, mainly including:

        1. Complicated semantic test, that is, some very convoluted sentences, to see if AI can understand;

        2. Emotion recognition of articles;

        3. Summary of the article;

        4. Article element extraction, mainly to extract the data in the article.

03 Complex semantic test

        This test can be described as an appetizer, because I always feel that this is not the main application scenario of a large language model, but it is also very interesting.

Use case 1:

        According to the description of the following sentence:

        Mingming clearly likes him for nothing, but Mingming just doesn't say anything.

        May I ask: Who likes whom?

In a word:

根据句子的描述,无法确定是谁喜欢谁。可能有多个人喜欢同一个人,
但是由于某种原因,他们没有表达出来,或者他们表达的方式不同。
需要进一步的了解和调查才能确定。

Xunfei Spark:

根据描述,"明明明明明白白白喜欢他"表示一个人喜欢另一个人。这里的"明明"指
的是这个人的名字,而"明明明明白白白喜欢他"则表示这个人非常清楚地喜欢另一
个人。所以,是"明明"喜欢"他"。

ChatGPT:

根据句子描述,似乎是 "明明明明明白白白" 喜欢 "他",但明明就是不说出口。
由于没有进一步的背景或上下文,因此无法确定具体是哪些人或生物。
根据上述回答,三个基本全错。

Use case 2:

        According to the description below:

        The eldest uncle went to the second uncle's house to find the third uncle and said that the fourth uncle stole 100 yuan from the fifth uncle's cabinet in the sixth uncle's house.

        I would like to ask: 1. Who is the thief? 2. Who did the money originally belong to?

In a word:

根据这个描述,可能无法确定小偷是谁。尽管四舅被指控偷了钱,但是我们并不知道
他是否真的偷了钱,或者他只是被误导了。同样,我们也不知道五舅或者七舅是否真
的有钱被偷。因此,我们需要更多的信息才能确定小偷的身份。
关于钱本来是谁的,根据句子中的描述,四舅是在别人的家中偷走了钱,因此这些钱
本来应该是属于这个家庭的。但是,我们不知道这个家庭是谁,也不知道这些钱被放
在了哪个柜子里。因此,我们无法确定这些钱本来是谁的。

Xunfei Spark:

这是一个有趣的故事,我们可以从中找到一些线索来回答这两个问题:
1. 谁是小偷?根据描述,四舅去了五舅家并从七舅的柜子里拿走了1000元钱。
所以,四舅是小偷。
2. 钱本来是谁的?由于四舅是从七舅的柜子里拿走的钱,那么这笔钱原本是属
于七舅的。然而,四舅为了得到这笔钱,采取了不诚实的行为,所以实际上钱并
没有归还给七舅。

ChatGPT

大舅到二舅家找三舅说四舅偷了钱。
钱是被放在七舅的柜子里。根据以上信息,无法确定具体是哪一个人偷了钱。
至于钱最初是属于谁的,也不清楚。需要更多的上下文信息和背景来做出进一步的判断。
根据上述回答,三个都大对了第一问,第二问没有答出来。降低一点难度。

Use case 3:

        According to the following description: the eldest uncle went to the second uncle's house to find the third uncle and said that the fourth uncle stole 100 yuan from the fifth uncle's cabinet in the sixth uncle's house. I would like to ask: 1. Who is the thief? 2. Who did the money originally belong to?

In a word:

根据描述,大舅去二舅家找三舅说四舅偷了五舅放在六舅家柜子里的100元钱。
由此可知,1、四舅是小偷;2、钱本来就是六舅的。

Xunfei Spark:

1、四舅是小偷。
2、钱本来是五舅的。

ChatGPT:

根据描述,可以知道以下几点:
大舅到二舅家找三舅说四舅偷了五舅放在六舅家柜子里的100元钱。
钱最初是属于五舅的。基于以上信息,可以得出以下推论:
四舅是小偷。
钱最初是属于五舅的。
显然,文心一言错了,讯飞星火和chatGPT对了。

Comments:

As far as this test is concerned, I feel that Wenxinyiyan is a little weaker. Xunfei Xinghuo and ChatGPT are basically the same. In the subsequent tests, I can also feel that Wenxinyiyan often makes mistakes in understanding commands.

04 Emotion recognition of articles

Example

In the morning, some unknown birds stood on the swaying reed stalks, held their heads high, flapped their wings, and competed to show off their moving voices. Floating in the fresh and moist air, the lingering sound curled up like a breeze blowing incessantly. The pink lotus is swaying, the tender yellow lotus pod that has just shed its petals. The fragrance is overflowing, and many red-winged and blue-headed dragonflies fly and stop on it, always reluctant to leave.

May I ask, what do you think the author felt when he wrote this passage?

Wenxinyiyan

According to the text description, the passage depicts a beautiful and peaceful morning, full of life and energy. The author conveys a harmonious, natural and peaceful atmosphere by depicting elements such as unknown birds, green reed waves, lotus pods, fragrances and dragonflies. Therefore, when the author wrote this passage, he may have expressed a mood of joy, appreciation and enjoyment of the beauty of nature.

Xunfei Spark:

Judging from the details of the author's description, one can feel that the author's mood is joyful and joyful. He was excited and praised the natural scenery in the early morning, especially the performance of those unknown birds and lotus, which made him even more happy and satisfied.

ChatGPT

According to this description, one can feel that the author is in a very fresh and joyful mood. Through detailed description, the author presents the natural scenery in the morning vividly, full of vigor and vitality. The singing of unknown birds, the fragrance of the lotus pods, and the soaring of the red-winged blue-headed dragonflies, all brought joy and admiration to the author.

Comments:

It may be that the questions are relatively simple. Personally, I think that the three are on par.

05 Summary of the article

Use case 1:

Beijing time on May 9th, in the second round of the NBA playoffs, the Heat beat the Knicks 109-101, giving the Heat a 3-1 lead. Heat (3-1): Butler 27 points, 6 rebounds and 10 assists, Adebayor 23 points and 13 rebounds, Struth 16 points and 6 rebounds, Lowry 15 points, 5 rebounds and 4 assists, Martin 10 points and 5 rebounds. Knicks (1-3): Brunson had 32 points, 4 rebounds and 11 assists, Barrett had 24 points, 4 rebounds and 3 assists, Randall had 20 points, 9 rebounds and 3 assists in the first quarter, the score was tied, and the Heat led the Knicks by 1 End the first quarter. In the second quarter, the Knicks didn't feel good, and the Heat played a climax to widen the point difference. At halftime, the Heat led the Knicks by 8 points. In the third quarter, the two sides launched a confrontation. The Knicks once chased the point difference to only 2 points, and the Heat played a small climax to open the score again. In the final quarter of the game, the Knicks continued to chase points, but too many fouls disrupted the rhythm. After Randall left the field with six fouls, the Knicks completely lost hope of chasing points. In the end, the Heat beat the Knicks 109-101. 1. May I ask who scored the most points and rebounded the most in this game?

In a word:

In this game, the Heat's Butler had the most points and Lowry had the most rebounds.

Xunfei Spark:

The highest scorer in this game was Brunson of the Knicks, who scored 32 points; the highest rebounder was Adebayor of the Heat, who scored 23 points and 13 rebounds.

ChatGPT:

According to the results of the game you provided, Butler of the Heat became the player with the highest score in the game with 27 points, and Adebayor of the Heat became the player with the most rebounds in the game with 13 rebounds.

        According to the above answers, Xunfei Xinghuo got all the answers right, Wen Xin got everything wrong, and chatGPT was half right.

use case 2

        In 1994, the "Computer Information System Security Protection Regulations" stipulated that the state implements a license system for the sales of computer information system security products, and the Ministry of Public Security has implemented product sales licensing administrative approval since 1997. In 2008, the former General Administration of Quality Supervision, Inspection and Quarantine and the National Certification and Accreditation Administration issued the "Announcement on the Implementation of Compulsory Certification for Some Information Security Products", which included 13 types of information security products into the scope of compulsory certification management; Announcement on Adjusting the Implementation Requirements for Compulsory Certification of Information Security Products, adjusting the requirements for compulsory certification of information security products to be implemented within the scope of the Government Procurement Law. In 2010, the Ministry of Finance, the Ministry of Industry and Information Technology, the former General Administration of Quality Supervision, Inspection and Quarantine, and the National Certification and Accreditation Administration jointly issued the "Notice on the Implementation of Government Procurement of Information Security Products", which once again clarified that those who use financial funds to purchase information security products should purchase Nationally certified products. These two systems have played an important role in regulating the management of network security products, but the management content overlaps, and to a certain extent, there are cases of repeated certification and testing.

        The "Network Security Law" implemented in June 2017 clearly stipulates that "key network equipment and special network security products shall be sold only after they have passed the security certification of a qualified institution or the security testing meets the requirements in accordance with the mandatory requirements of relevant national standards." Or provide. The national network information department, in conjunction with relevant departments of the State Council, formulates and publishes a catalog of key network equipment and special network security products, and promotes mutual recognition of security certification and security testing results to avoid repeated certification and testing. In order to implement the relevant provisions of the "Network Security Law", the Cyberspace Administration of China, together with the Ministry of Industry and Information Technology, the Ministry of Public Security, and the National Certification and Accreditation Administration, have successively issued catalogs of key network equipment and special products for network security, and determined to undertake security certification and security testing tasks Institutions, clarify the unified release process of certification and testing results, and formulate the mandatory national standard "Technical Requirements for Information Security Technology and Network Security Special Product Safety".

        This time the five departments jointly issued the "Announcement" to unify the certification and testing system for network security products, stop issuing the "Sales License for Computer Information System Security Products", and stop implementing the mandatory certification requirements for information security products in the field of government procurement. The Law is an important measure to promote security certification and mutual recognition of security testing results. It will unify the security requirements of network security products, improve the overall security protection capabilities of products, reduce the burden on network security enterprises, create a good industrial development environment, and develop a strong network security industry. It is of great significance to enhance national network security capabilities.

        Question: What is the main content of the above passage?

In a word:

The above paragraph mainly introduces that after the implementation of the "Network Security Law", the state strengthens network security management, implements the relevant provisions of the "Network Security Law", implements government procurement of key network equipment and special products for network security, and promotes security certification and security testing results. Mutual recognition and other measures to unify the security requirements of network security products, improve the overall security protection capabilities of products, reduce the burden on network security enterprises, create a good environment for industrial development, develop a strong network security industry, and enhance national network security capabilities.

Xunfei Spark:

The above passage mainly introduces the development history of China in terms of network security product management, including licensing system, compulsory certification, government procurement requirements, etc. Then, it focuses on the regulations on the management of network security products in the "Network Security Law" implemented in 2017, and the important measures for the certification and testing system of network security special products issued by the announcement jointly issued by the five departments. The announcement aims to unify the security requirements of network security products, reduce the burden on enterprises, promote industrial development, and improve national network security capabilities.

ChatGPT:

This paragraph mainly introduces the unification of the network security product certification and testing system and the measures to stop the original system. It also mentioned the implementation of relevant regulations and standards, and the significance of these measures for the cybersecurity industry and national cybersecurity capabilities.

Ask again:

Please introduce the content of the article in more detail

answer:

This passage gradually leads to the unification of the current network security product certification and testing system from a historical perspective. From 1994 to the present, it refers to the implementation of the sales permit system in the original computer information system security protection regulations, and the expansion of the scope of mandatory certification management in 2008. However, these systems overlap and overlap to a certain extent, causing many problems. Therefore, in 2017, the "Network Security Law" clearly stipulates that key network equipment and special network security products must comply with the mandatory provisions of relevant national standards, and can only be sold or provided for use after passing the safety certification or safety testing of qualified institutions. Today, the five departments jointly issued an "Announcement" to unify the certification and testing system for network security special products and stop the previous policy requirements. These departments have also formulated mandatory national standards, explaining what technical requirements network security products should meet, and which institutions have the responsibility to certify and test products. These measures aim to unify the security standards of network security products, improve the overall security protection capabilities of products, reduce the burden on network security companies, and create a favorable environment for industrial development, with a view to developing a strong network security industry and enhancing national network security capabilities.

Comments:

        According to the above answers, I personally think Xunfei Xinghuo's answer is the best. Wenxin's answer did not summarize the content of the first paragraph. The first answer of chatGPT was too simple, and the second answer was more accurate.

06 Article element extraction

        Given a piece of text, it is required to extract the corresponding information from it.

Use case: (National Bureau of Statistics data)

        Assuming you are a data analyst, please follow the content below:

        In terms of industry categories, the operating income of news information services was 367.4 billion yuan, an increase of 12.5% ​​over the same period of the previous year; content creation and production was 645.6 billion yuan, an increase of 5.8%; creative design services were 449.9 billion yuan, an increase of 3.9%; cultural communication channels were 374.6 billion yuan , an increase of 14.0%; cultural investment and operation was 12.6 billion yuan, an increase of 12.1%; cultural entertainment and leisure services were 34.2 billion yuan, an increase of 48.5%; cultural auxiliary production and intermediary services were 351.9 billion yuan, a decrease of 4.6%; cultural equipment production was 134.6 billion yuan, a decrease 7.9%; cultural consumption terminal production was 510.8 billion yuan, down 2.4%.

        According to the above content, please organize it in the form of a table to make it more intuitive.

In a word :

Xunfei Spark:

ChatGPT

Comments:

        This paragraph is actually not difficult, but the performance of Wen Xinyiyan really disappointed me. The performance of Xunfei Xinghuo is still good.

07 Summary

        The articles in this test are only new articles in the past few days. In the test, we also found a big problem with Wenxinyiyan, that is, the maximum intelligent input is 1024 Chinese characters, and Xunfei Xinghuo can input more than 6300 characters. Therefore, many The article Wenxin can't be tested, if it counts as a score, Wenxin's statement will be deducted.

        From the above test, except that he refused to answer some questions, Xunfei Xinghuo’s performance is obviously better than Wenxin Yiyan’s, which can be said to be very close to the level of ChatGPT3.5, so it may not be empty talk to catch up with ChatGPT on 10.24 .

Guess you like

Origin blog.csdn.net/m0_37771865/article/details/130717654