When the four AI large models meet the real test questions, who is stumped?

In today's society, artificial intelligence (AI) is developing at an incredible speed and emerging in various fields, bringing many conveniences to people's life and work. The AI ​​large model is known as the "second brain" of human beings and has become an "intelligent assistant" for people to study, live and work.

The civil service examination is unique in the field of education in our country, and has received the attention and favor of many candidates. As we all know, in the process of civil service examinations, the quantitative relationship among the real test questions is the most difficult and time-consuming, which puts a lot of pressure on candidates to prepare for the test. Under such a background, can AI big models be used to do real test questions, can they be done correctly?

Today, let's briefly evaluate the actual capabilities of GPT-3.5, GPT-4, Wenxin Yiyan and Tongyi Qianwen in specific scenarios.

We selected the real questions of the 21-year national civil service examination "Practice Test"

A certain place dispatched 96 people to four densely populated areas of stations, airports, supermarkets and schools for health and safety inspections, including 62 public health professionals. It is known that the number of personnel dispatched to the airport is the largest among the four regions. Among the personnel dispatched to stations and supermarkets, professionals account for 64% and 65% respectively. Among the personnel dispatched to schools, non-professionals are 30% less than professionals , among the personnel dispatched to the airport, what is the proportion of professionals among the four regions?

Correct answer: number one

The first AI model contestant: GPT-4

Introduction: GPT-4 is the latest generation of language model released by OpenAI. It will be officially released on March 14, 2023, and will be open to users through the API and ChatGPT Plus platform.

Answer: Among the four regions, the airport ranks first in the proportion of professionals .


The answer is absolutely correct and the process is very detailed.

The second AI model contestant: GPT-3.5

Introduction: OpenAI officially released GPT-3.5-turbo on March 1, 2023. It is currently one of the largest pre-trained language models, containing more than 100 million parameters, and can be used for various natural language processing tasks.

Answer: No. 2 in four regions .

mistake.

The third AI model contestant: Tongyi Qianwen

Introduction: Tongyi Qianwen is a pre-trained language model launched by Alibaba. It is a super-large-scale language model independently developed by Dharma Institute. It can also answer questions, create text, express opinions, and write code.

Answer: Therefore, the percentage of professionals ranks 4th among the four regions .

mistake.

The fourth AI model contestant: Wen Xin Yi Yan

Introduction: Wenxin Yiyan (English name: ERNIE Bot) is a generative dialogue product launched by Baidu based on Wenxin large-scale model technology.

Answer: No. 2 in four regions .

mistake.

Beezy Reviews

1. Accuracy

GPT-4's answer

By establishing multiple equations and performing algebraic operations, it was finally obtained that the proportion of professionals in the airport ranks first. The whole process has a clear derivation process, taking into account all the constraints. The process is exhaustive and the answers are spot on.

GPT-3.5's answer

The derivation of the inequality form is unclear and wrong, and the solution in specific practical situations is not given.

Answers to common sense questions

By calculating the proportion of professionals and non-professionals in the four regions, and then calculating the ranking, there are obvious mistakes in this process. When calculating the proportion of professionals in the four regions, Tongyi Qianwen did not take into account that the total number of people in different regions is known and has restrictions, but directly added the proportions. This answer is wrong.

Answers from the heart

The equation is not established, and the detailed derivation process is not given, only the conclusion is given. In terms of accuracy, this answer is not reliable.

2. Practicality

From the perspective of practicality, the answer of GPT-4 clearly describes the thinking of solving the problem, and finds the answer through the establishment and simplification of the equation. More practical than other answerers. However, considering that the public test has a very strong time limit, the problem-solving needs to be completed within 1-2 minutes earlier. Therefore, GPT-4 may not have an advantage in Mathematical Olympiad.

3. Mathematical logic derivation

The answer of GPT-4 has a clear equation establishment, which conforms to the meaning of the question, and the purpose of solving is achieved through substitution and simplification. The derivation process is more rigorous.

The answer of GPT-3.5 is not clear and wrong due to the error of the inequality condition, which does not meet the conditions of the question.

Although Tongyi Qianwen's answer has a certain derivation process, it mistakenly added the proportions without considering the actual constraints, and the calculation process was wrong.

Wen Xinyiyan's answer did not establish an equation, and lacked a rigorous mathematical derivation process.

On the whole: GPT-4's answer has better performance in terms of accuracy, practicality and mathematical derivation. GPT-3.5, Tongyiqianwen, and Wenxinyiyan answered three questions, namely, the wrong inequality conditions, the wrong calculation process and the lack of derivation process. However, combined with the strict time limit nature of the actual public examination and testing process, in fact, the large AI model may not be able to fully meet the standards.

END

Guess you like

Origin blog.csdn.net/BeezyShowcase/article/details/130810919