Real hammer research, ChatGPT ability is offline!

As early as more than a month ago, rumors of ChatGPT performance degradation began to spread on the Internet. Many users who subscribed to the Plus version expressed that they felt that ChatGPT began to lose intelligence after several rounds of updates, and sometimes there were even problems with the response speed. . And now, this rumor has finally been confirmed.

Just this week, researchers at Stanford and UC Berkeley published a paper validating how ChatGPT has changed in recent months, with fluctuations in ChatGPT's multifaceted abilities, especially when it comes to handling encoding and compositional tasks There has been a serious decline in efficiency.

picture

The release of this paper has also aroused heated discussions in the industry. Many AI experts have expressed their opinions on various platforms, and the analysis of the reasons for the performance change of ChatGPT and the speculation of what OpenAI intends to do have also begun to spread like wildfire. spread in circles.

1. The broken ChatGPT

According to the paper, the researchers evaluated the performance of the March and June 2023 versions of the GPT3.5 and GPT4 models on four different tasks, including solving math problems, answering sensitive or dangerous questions, generating code, and performing tasks based on image recognition. visual reasoning.

Surprisingly, the experimental results demonstrate that the performance and behavior of these two large models can vary greatly over time. For example, the March version of GPT4 is very good at solving mathematical problems. The accuracy rate of prime number recognition can reach more than 97%, while the accuracy rate of the June version has dropped to only 2.4%.

picture

In terms of code generation, from March to June, the executable rate of code generated by GPT4 plummeted from 50% to 10%. Although the decline in GPT3.5 was not obvious, the same trend also appeared. And when dealing with the same task, compared to before, the time spent on generating code for the two models in the June version has increased to a certain extent.

In terms of answering sensitive questions and image reasoning, the differences between the two versions of the two models are not obvious. Due to the addition of more guardrail restrictions, the June version of GPT4's ability to control sensitive issues has increased by 16%, while GPT3.5 has decreased by 6%. In terms of image inference, there is almost no difference between the different versions.

2. Speculation about recession

Regarding the reasons for the performance decline of the GPT model, many people in the industry have also given their own speculations. It can be roughly divided into three categories. One is that OpenAI sacrifices model performance in order to reduce costs. The other is that too many AI alignments limit model capabilities. .

According to Conan, SEEK's global artificial intelligence director, OpenAI adopted a mixed model of experts (MOE) architecture with a high probability when building GPT4, which means that GPT4 is not a large model, but is composed of multiple small expert models in specific fields. It is worth mentioning that Greg Brockman, the founder of OpenAI, also mentioned the technical path of MOE in the research he participated in.

Such an architecture can theoretically make GPT4 generate responses cheaper and faster, but Conan also said, "Although the use of a mixed expert model can provide cost-effective advantages, there is a trade-off between model cost and quality." In the paper After the release, Conan also specially published a tweet saying that his idea is likely to be verified.

picture

Nvidia AI scientist Jim Fan shared another perspective on Twitter - the trade-off between security and performance. According to Jim Fan's guess, OpenAI spent a lot of energy removing the brain lobe (for AI alignment) for GPT4 from March to June, but they didn't have time to fully restore other important functions.

picture

A paper released by Microsoft Research a few months ago also mentioned this point of view: any AI alignment of the AI ​​​​model will lose the accuracy and performance of the model. According to Gudhardt's law, the reward model is not a perfect proxy, so excessive AI alignment will hinder the model's true degree of understanding of the task and the speed of response.

In addition, there is currently a conspiracy theory that has been widely disseminated on the Internet-OpenAI deliberately reduces the encoding ability of GPT4 so that more people can pay for Copilot. The main argument in favor of this view is that Microsoft just raised the price of Copilot by a staggering 83% just a few days ago.

3. Inconsistent responses

As a "person" in the center of the vortex, OpenAI has actually responded to the performance degradation of ChatGPT many times since the end of May. On May 31, Logan.GPT, the official technical spokesperson of OpenAI, commented in a tweet discussing the performance of ChatGPT that the quality of GPT4 in API mode has not declined.

picture

On July 14, Peter Welinder, vice president of OpenAI products, also personally tweeted, "No, we didn't make GPT4 stupid. On the contrary, we made each version smarter than the previous version." Doubt, Peter also explained, "As you use it more, you start to notice things you didn't notice before."

picture

However, with the release of the paper, OpenAI's confidence and attitude towards this matter have also undergone some changes. Just on Wednesday, Logan.GPT responded to a link to the paper tweeted by one of the paper's authors, Matei Zaharia. However, this time Logan.GPT changed its previous talk. It not only thanked Matei for its research, but also said that OpenAI is investigating the matter and offered to help with future experiments.

picture

Regarding Logan.GPT's inconsistent remarks, many netizens also started to mock at the bottom of this tweet. A netizen named Pranay mentioned, "If you want others to help you evaluate and solve problems, how about you open source your own model?" However, so far, neither Logan.GPT nor other OpenAI personnel have further pushed forward Article for comment.

4. The ever-changing AI landscape

With the fact that the performance of ChatGPT has once again been confirmed in the paper, more and more accusations and doubts from the industry are also pointing to OpenAI. AI scientist Daniel Jeffries said in an interview that OpenAI should continue to support older versions of the model as new changes are introduced, so that software developers can build on reliable tools instead of updating without prompting.

The best solution to the instability problem is an open source model, such as Llama2, which Meta has just announced as open source. With widely distributed weights, these models allow researchers to work from the same baseline and obtain consistently reproducible results. Microsoft's recent close cooperation with Meta is also proving the feasibility of this route.

picture

Hugging Face artificial intelligence researcher Sasha Luccioni also believes that there is a big problem with OpenAI's opacity, "The results of any closed-source model cannot be replicated and verified, and scientists have no responsibility to continuously monitor the large models that have been deployed. Although GPT4 is very good , but more applications and research in the future will be based on a more transparent and stable model."

It is worth mentioning that after the release of Llama2, executives from technology giants including Nvidia, AMD, HuggingFace, GitHub, DataBricks, etc., as well as professors from top universities such as Berkeley and MIT, also announced that they will conduct Meta cooperate. And this is undoubtedly worse news for OpenAI, which is currently suffering from model performance problems.

5. Write at the end

Although some experts in the industry are also questioning the accuracy of the detection method proposed in this paper. But whether it is the huge data discrepancy shown in the paper, or a large number of feedback from users, or the change of OpenAI's attitude towards this matter, it undoubtedly proves that the GPT4 model has indeed had some problems in the past few months.

We must admit that today's OpenAI is still standing at the top of AI technology. With regard to technical issues, if even OpenAI cannot solve them smoothly, then this will also become a deep valley that is difficult for other companies to cross in the short term.

However, even though OpenAI has a leading edge in technology, more and more pressure from competitors and accelerated changes in the large-scale model market have also made OpenAI, a young star company, gradually expose some shortcomings in long-term layout planning. Short board, the recent sharp decline in the growth rate of ChatGPT visits is also enough to explain some problems.

An obvious truth is that when everything around is changing rapidly, only species with strong enough adaptability and the ability to make corresponding changes according to the environment have the right to long-term survival and sustainable development. As far as the current situation is concerned, the time left for OpenAI to make changes has become increasingly tense.

Finally, do you have any different views on the reasons for ChatGPT's intelligence reduction? Welcome to share your views in the comment area.

Guess you like

Origin blog.csdn.net/java_cjkl/article/details/131983815