GPT-4 and ChatGPT competition, who wins?

Click the link to view the original text: GPT-4 and ChatGPT competition, who wins?
Source: [Lingdian Youshu Technology], authors Xu Zhengjun and Yuan Yue

As one of the milestone events in the history of artificial intelligence, ChatGPT has been hotly discussed since its release on November 30, 2022. The ChatGPT craze has not yet abated. On March 14, 2023, OpenAI continued to release a new generation of AI language model GPT-4, and officially declared that GPT-4 is "OpenAI's most advanced system" and "can produce more secure, more useful responses".

As a natural language processing tool (AI language model) driven by artificial intelligence technology launched by the same company, compared with the previous ChatGPT, how advanced is GPT-4? What is the underlying logic behind these advancements?

Given that ChatGPT is an application product based on the fine-tuning of the GPT-3.5 large model-AI chat robot, and GPT-4 is the next-generation large model of GPT-3.5, this article will compare from two levels: one is from the basic model At the level, compare GPT-4 and GPT-3.5 (as the underlying logic); second, at the level of application capabilities, compare GPT-4 and ChatGPT (as the advanced point).

1. GPT-4 and GPT3.5

GPT-4 is the latest generation of AI natural language large model since OpenAI released GPT-1 in June 2018 and after GPT-2, GPT-3, and GPT-3.5 (see Table 1 below).

From the perspective of history, the principle is basically the same. First, they all use an associated statistical method called "autoregressive generation"; second, they first use unsupervised learning to pre-train a basic general model, and then fine-tune and adapt various models through supervised learning. task, and finally adopt the reinforcement learning method of "reinforcement learning from human feedback", so that the model can communicate like a human being; third, it is based on an algorithm framework called "Transformer". That is, they are all "GPT" (generative pre-training converter, or generative pre-training large model).

However, the difference is that the variables of evolution are mainly related to the scale of the model (that is, the number of parameters), the amount of pre-trained data, the ability to support input information (whether it is multi-modal, whether it is long information), model function (whether there is multiple capabilities) as well as aspects of model performance, application security, and reliability.

The specific comparison is as follows:

1. Model scale. Compared with the 175 billion parameters of GPT-3.5, the parameters of GPT-4 have reached 500 billion (also reported as 1 trillion), and the scale of GPT-4 is larger than that of GPT-3.5. Larger scale generally means better performance and the ability to generate more complex and accurate languages.

2. Training data. GPT-3.5 uses a large amount of text data from Wikipedia, news reports, website articles, etc. on the Internet, with a size of about 45TB. GPT-4 uses a larger amount of text data such as web pages, books, papers, program codes, etc., and also uses a large amount of visual data. Although it is impossible to study the specific values, there is no doubt that the training data of GPT-4 is more abundant than that of GPT-3.5. This allows GPT-4 to have broader knowledge and more specific answers.

Table 1 Comparison of OpenAI GPT model parameters and pre-training data volume

3. Modality and information. GPT-3.5 is a text-based single-modal model. Whether it is images, text, or audio, users can only input information of one type of text. GPT-4, on the other hand, is a multimodal model that can accept text and image prompts (including documents with text and photos, diagrams, or screenshots). This allows GPT-4 to combine both types of information to generate more accurate descriptions. In terms of the length of the input information, compared with the GPT-3.5 limit of 3,000 characters, GPT-4 increases the text input limit to 25,000 characters. The increase in text input length limit also greatly expands the usefulness of GPT-4. For example, you can input nearly 50 pages of books into GPT-4 to generate a summary summary, and directly input the 10,000-word program document into GPT-4 to let it modify the bug directly.

4. Model function. GPT-3.5 is mainly used for text answering and script writing. GPT-4, in addition to text answers and script writing, also has more functions such as answering pictures, data reasoning, analyzing charts, summarizing summaries, and role-playing.

5. Model performance. While GPT-3.5 has shown strong performance, GPT-4 performs better in handling more complex problems. For example, in terms of various professional and academic benchmarks, GPT-4 shows similar human performance; in terms of simulated bar exams, GPT-4 can enter the top 10% of test takers, while GPT-3.5 is in the bottom 10% of test takers. ; In USABO Semifinal Exam 2020 (American Biology Olympiad), GRE Speaking and many other test items, GPT-4 has also achieved close to full marks, which is almost close to human level. See Figure 1 below.

Figure 1 GPT-4 test results (sorted by GPT-3.5 performance)

(Data source: https://openai.com/research/gpt-4)

6. Safety and reliability. GPT-4 improves strategies against generating toxic or inauthentic content to reduce the risk of misleading information and malicious use, and improve its security and reliability. In particular, GPT-4 achieves the best results ever (although it is not perfect) on factuality, bootstrapping, and rejection of out-of-range (non-compliance) questions. Compared with GPT-3.5, GPT-4 scored 40% higher than GPT-3.5 on the fact test of generated content, and responded to sensitive requests (such as medical advice and self-harm) 29% more often in line with policy, 82% lower propensity to respond to requests for disallowed content.

Overall, GPT-4 is more reliable, more creative, and able to handle finer-grained instructions than GPT-3.5. See Table 2.

Table 2 New changes from GPT-3.5 to GPT-4

2. GPT-4 and ChatGPT

ChatGPT is an AI chatbot based on GPT-3.5. But in terms of dialogue, GPT-4 has shown better coherence and contextual understanding: not only can it generate fluent, accurate and logical text, it can also understand and answer various types of questions, and it can even interact with users Perform creative and technical writing assignments. Among them, the more prominent application capabilities are as follows.

1. Added image recognition and analysis capabilities. Compared with ChatGPT, in addition to supporting text input, GPT-4 has added image recognition and analysis functions, that is, it can recognize images (output descriptions of images), analyze graphs (similar to graph analysis in EXCEL), find Unusual things in the picture (identify the abnormal phenomenon in the picture), read the document and summarize the summary (such as summarizing the content of the PDF file), etc. You only need to draw a draft of a website on paper, take a photo and upload it to GPT-4, and the model can generate website code.

2. More advanced reasoning ability. Compared with ChatGPT, which can only perform simple and direct reasoning to a certain extent, GPT-4 can perform complex and abstract thinking and solve more complex problems. As mentioned earlier, GPT-4 has demonstrated human performance in many professional and academic fields. For example, the US bar exam has reached the top 10% standard, and the law school entrance examination has also reached 88%. SAT College Admission Test also achieved a score of 90%. In particular, the mathematical problem-solving ability that ChatGPT is not good at, GPT-4 has been greatly improved, and scored 700 points out of 800 in the SAT math test of the American College Entrance Examination.

3. Higher levels of creativity and collaboration. Unlike ChatGPT, which can only perform limited creation and collaboration within a certain range, GPT-4 can perform creative and technical writing tasks with users, such as composing songs, writing scripts, or learning users' styles and preferences, as well as generating, editing and Iterate over various types and styles of text, and be able to improve its output based on user feedback and suggestions.

4. Broader application prospects . With near-human-level language understanding and generation capabilities and other advantages, GPT-4 can play an important role in various fields and occasions. For example, GPT-4 can be used as an intelligent assistant, educational tool, entertainment partner and research assistant, enabling Office software, search engines, virtual tutor applications, etc. According to public information reports, Microsoft has connected GPT-4 to the Office suite to launch a new AI function Copilot, and has also connected GPT-4 to Bing to provide customized search services; Morgan Stanley is applying GPT-4 to wealth management. Classification and retrieval of market information of the management department; Doulingo will use GPT-4 for role-playing to enhance language learning; BeMyEyes is using GPT-4 to convert visual pictures into text to help blind people understand; Khan Academy has also used GPT-4 As virtual mentor Khanmigo...and more.

It is foreseeable that GPT-4 will be connected to more and more industries, thereby promoting the improvement of social productivity and creativity, and bringing convenience and value to mankind. At the same time, with the expansion and deepening of the application of GPT-4, GPT-4 will learn more and faster from human feedback, and the speed of iterative upgrading of its model will also be accelerated, with more functions and more Strong performance will be presented to the world.

3. Common problems

As mentioned earlier, GPT-4 and ChatGPT are both generative AI natural language models. The so-called generative, in short, is to predict the next most likely associated word based on the input word, and then input the most likely word into the model, and then predict the next most likely associated word ..., similar to "Word Solitaire", so continue. By "training" a large number of existing human corpus, the parameters of the model are constantly adjusted, so that the model's "Word Solitaire" level is constantly approaching the real situation of human corpus, that is, let the model learn the rules. Therefore, both GPT-4 and ChatGPT will have a series of problems caused by the shortcomings of the generative formula itself.

For example: if there is a lot of false information in the real corpus itself, or there is a lot of toxic information (such as full of prejudice or maliciousness such as race, gender, religion, politics, etc.), and this information happens to be learned by the model, which will undoubtedly lead to harmful generation of the model. The risk of content; if there is content that is actually different but happens to conform to the same law, the model may not be able to distinguish its authenticity. The most direct result is that if the content that does not exist in reality just matches the law learned by the model from the training materials, It is possible for the model to carry out "lawful mixed fabrication" of content that does not exist, that is, to generate false information; because the model lacks interpretability, and we cannot directly check what the model has memorized and learned, we can only use Ask multiple questions to evaluate and guess what it has learned, which will lead to the risk of privacy leakage (according to a BBC report on March 23, some users saw the titles of other people's historical search records using ChatGPT on social media); based on " Intensive learning from human feedback” will inevitably learn the laws that should not be learned from malicious induction, which will bring impacts on ideological invasion and network security... In short, with the wider and deeper application, both GPT-4 and ChatGPT will face more security and risk challenges.

As Sam Altman, the founder and CEO of OpenAI, told ABC News in a recent interview, he's "scared" about AI technology and how it could affect the workforce, elections and the spread of disinformation. He also warned that the widespread use of artificial intelligence may have negative impacts, which requires the joint participation of the government and society in regulation. He called for feedback and rules to be critical to curbing the negative impacts of artificial intelligence.

Guess you like

Origin blog.csdn.net/Dataway_Dataway/article/details/130947782