Behind the carnival of ChatGPT: shortcomings are still there, but there are a lot of inspirations. Here are the things you can do in 2023...

Are ChatGPT’s powerful capabilities inherent? What are its shortcomings? Will it replace search engines in the future? What inspirations does its emergence bring to our AI research? In response to these issues, several AI researchers conducted in-depth discussions.

In the last month of 2022, OpenAI responded to people's expectations for a whole year with a popular conversational robot-ChatGPT, although it is not the long-awaited GPT-4.

Anyone who has used ChatGPT can appreciate that it is a real "hexagon warrior": not only can it be used to chat, search, and translate, but it can also be used to write stories, code, debug, and even develop small games and participate in U.S. College entrance examination... Some people joked that from now on, there will be only two types of artificial intelligence models - ChatGPT and others.

Due to its amazing capabilities, ChatGPT attracted 1 million users in just 5 days after it was launched. Many people boldly predict that if this trend continues, ChatGPT will soon replace search engines such as Google and programming question and answer communities such as Stack Overflow.

However, many of the answers generated by ChatGPT are wrong and cannot be seen without looking carefully. This will cause the answers to questions to be confused. This "very powerful but error-prone" attribute has given the outside world a lot of room for discussion. Everyone wants to know:

  • Where does ChatGPT’s powerful capabilities come from?

  • What are the shortcomings of ChatGPT?

  • Will it replace search engines in the future?

  • What inspirations does its emergence bring to our AI research?

In the sixth "REDtech is coming" technical live broadcast organized by Xiaohongshu's technical team, Li Lei, an expert in the field of NLP and assistant professor at the University of California, Santa Barbara, and Zhang Lei, vice president of Xiaohongshu's technology, and the multimedia intelligence of Xiaohongshu's community department Zhang Debing, the head of the algorithm, started a conversation and exchanged and answered hot questions about ChatGPT.

Li Lei has served as a young scientist at Baidu's US Deep Learning Laboratory and as a senior director at ByteDance's Artificial Intelligence Laboratory. He has published more than 100 papers at top international academic conferences in the fields of machine learning, data mining and natural language processing, and won the 2021 ACL Best Paper Award. In 2017, Li Lei won the second prize of Wu Wenjun Artificial Intelligence Technology Invention Award for his work on the AI ​​writing robot Xiaomingbot. Xiaomingbot also has strong content understanding and text creation capabilities, and can smoothly broadcast sports events and write financial news.

Zhang Lei, vice president of technology at Xiaohongshu, was the technical director of IBM's Deep Question Answering (DeepQA) project in China. He has extensive development experience in areas such as question and answer robots and search advertising CTR machine learning algorithms. Zhang Debing was once the chief scientist of Green Eye, and led the team to win many academic competition championships, including the FRVT world championship in the authoritative international face recognition competition.

The discussion by the three guests not only focused on the current capabilities and problems of ChatGPT, but also looked forward to future trends and prospects. In the following, we sort out and summarize the content of the exchange.

OpenAI co-founder Greg Brockman recently tweeted that 2023 will make 2022 look like a dull year for AI advancement and adoption.

Where does ChatGPT’s powerful capabilities come from?

Like many people who tried ChatGPT, the three guests were also impressed by the powerful capabilities of ChatGPT.

Among them, Zhang Debing gave an example of letting ChatGPT act as a Linux Terminal: telling ChatGPT the approximate machine configuration, and then letting it execute some instructions on this basis. It was found that ChatGPT can remember a long operation history, and the logical relationship between the previous and later is very consistent. (For example, if you write a few lines of characters into a file, and then ask it to display which characters have been written in the file, it will be displayed).

DeepMind researcher Jonas Degrave uses ChatGPT to act as an example of a Linux Terminal.

This result made Zhang Debing and others wonder, did ChatGPT open a terminal in the background to deceive users? So they conducted some tests: Let ChatGPT execute some very complex instructions (such as two for loops, each of which has 1 billion times). If ChatGPT really opens a terminal, it will be stuck for a while. . The result was unexpected: ChatGPT quickly skipped this process and displayed the next result after this command. This made Zhang Debing and others realize that ChatGPT did roughly understand the logic of the entire demo, and it had a certain "thinking" ability.

So where does this powerful ability come from? Zhang Lei put forward two hypotheses. One hypothesis is that this ability itself is built into the large model, but we have not released it properly before; another hypothesis is that the built-in ability of the large model is not actually that strong, and we need to use human power to do it. Make some adjustments.

Both Zhang Debing and Li Lei agree with the first hypothesis. Because we can intuitively see that there are several orders of magnitude differences in the amount of data required to train and fine-tune large models. In the "pre-training + prompting" paradigm used by GPT-3 and subsequent models, , this difference in data volume is even more obvious. Moreover, the in-context learning they use does not even require updating model parameters. It only needs to put a small number of labeled samples in the context of the input text to induce the model to output answers. This seems to indicate that ChatGPT’s powerful capabilities are indeed endogenous.

Comparison between the traditional fine-tune method and the in-context learning method of GPT-3.

In addition, the power of ChatGPT also relies on a secret weapon - a training method called RLHF (Reinforcement Learning with Human Feedback).

According to official information released by OpenAI, this training method can be divided into three stages:

  1. Supervision strategy model in the cold start stage: randomly select a batch of prompts submitted by test users, rely on professional annotators to provide high-quality answers to the specified prompts, and then use these manually labeled < prompt, answer > data to Fine -tune the GPT 3.5 model, so that GPT 3.5 has the initial ability to understand the intentions contained in the instructions;

  2. Training reward model (RM): Randomly sample a batch of prompts submitted by users, then use the first-stage Fine-tune good cold start model to generate K different answers for each prompt, and then let the annotators evaluate the K The results are sorted and used as training data to train the reward model through pair-wise learning to rank mode;

  3. Use reinforcement learning to enhance the capabilities of the pre-trained model: use the RM model learned in the previous stage and update the pre-trained model parameters based on the RM scoring results.

Two of these three stages use manual annotation, which is the so-called "human feedback" in RLHF.

Li Lei said that the results produced by this method were unexpected. When doing machine translation research before, they usually used the BLEU score (a fast, cheap, and language-independent automatic machine translation evaluation method that has a strong correlation with human judgment) to guide the model. This method is effective at times, but as the model gets larger and larger, its effect continues to weaken.

Therefore, their experience is that training a very large model like GPT-3 with the help of feedback will not theoretically improve much. However, ChatGPT’s stunning results overturn this experience. Li Lei believes that this is what shocks everyone about ChatGPT and reminds everyone to change their research concepts.

What are the shortcomings of ChatGPT?

However, despite being shocked, the three guests also pointed out some of ChatGPT’s current shortcomings.

First of all, as mentioned before, some of the answers it generates are not accurate enough, and "serious nonsense" will appear from time to time, and it is not very good at logical reasoning.

Secondly, the deployment cost required for practical application of a large model like ChatGPT is quite high. And there is currently no clear evidence that models can maintain such powerful capabilities by reducing their size by an order or two. " If such amazing capabilities can only be maintained on a very large scale, it is still far from application, " Zhang Debing said.

Finally, ChatGPT may not reach SOTA for some specific tasks (such as translation). Although the API of ChatGPT has not been released yet, and we cannot know its capabilities on some benchmarks, Li Lei's students discovered during the test of GPT-3 that although GPT-3 can complete the translation task excellently, it is not as good as the current one. The bilingual models trained separately are still worse (BLEU scores differ by 5 to 10 points). Based on this, Li Lei speculated that ChatGPT may not reach SOTA on certain benchmarks, and may even be some distance away from SOTA.

Can ChatGPT replace search engines such as Google? What inspiration does it have for AI research?

Among the various discussions about ChatGPT, the topic "Can it replace search engines" may be the most popular one. Recently, the New York Times reported that the popularity of ChatGPT has made Google feel like a powerful enemy. They are worried that if everyone uses chatbots like ChatGPT, no one will click on Google links with ads (in 2021, Google Advertising revenue accounts for 81.4% of total revenue). In a memo and recording obtained by The New York Times, Google CEO Sundar Pichai has been meeting to "define Google's AI strategy" and "disrupt the work of numerous teams within the company in response to ChatGPT." threat".

In this regard, Li Lei believes that it may be a bit early to talk about replacement now. First of all, there is often a deep gap between the popularity of new technologies and commercial success. In the early years, Google Glass also said that it would become a new generation of interaction method, but it has not been able to fulfill its promise so far. Secondly, ChatGPT does perform better than search engines on some question and answer tasks, but the requirements carried by search engines are not limited to these tasks. Therefore, he believes that we should build products based on ChatGPT's own advantages, rather than necessarily aiming to replace existing mature products. The latter is a very difficult thing.

Many AI researchers believe that ChatGPT and search engines can work together. The relationship between the two is not to replace or be replaced, as shown by the recently popular "youChat".

Zhang Debing also holds a similar view and believes that it is unrealistic for ChatGPT to replace search engines in the short term. After all, it still has many problems, such as being unable to access Internet resources and producing misleading information. In addition, it is still unclear whether its ability can be generalized to multi-modal search scenarios.

But it is undeniable that the emergence of ChatGPT has indeed given AI researchers a lot of inspiration.

Li Lei pointed out that the first noteworthy point is the ability of in-context learning . In many previous studies, everyone has ignored how to tap the potential of existing models in some way (for example, the machine translation model is only used for translation, without trying to give it some hints to see if it can generate better translation), but GPT-3 and ChatGPT did it. Therefore, Li Lei is wondering if we can change all previous models to this form of in-context learning and give them some text, image or other forms of prompts so that they can fully exert their capabilities. This will be a A very promising research direction.

The second noteworthy point is the important role human feedback plays in ChatGPT . Li Lei mentioned that the success of Google search is actually largely due to its ease of obtaining human feedback (whether to click on the search results). ChatGPT obtains a lot of human feedback by asking people to write answers and rank the answers generated by the model, but this method of obtaining is relatively expensive (some recent research has pointed out this problem). Therefore, Li Lei believes that in the future we need to consider how to obtain a large amount of human feedback at low cost and efficiently.

Xiaohongshu’s new technology for “planting grass”

For Zhang Debing, who is engaged in multi-modal intelligent creation research at Xiaohongshu, ChatGPT also provides a lot of inspiration.

First of all, this model intuitively demonstrates the significant improvement of the large NLP model compared to the small model in various scenarios such as complex multi-round dialogue, generalization of different queries, and chain of thought. The relevant capabilities are currently on the small model. is not available.

Zhang Debing believes that these related capabilities of NLP large models may also be tried and verified in cross-modal generation. At present, the cross-modal model still has a significant gap compared with GPT-3 and ChatGPT in model scale, and there are also many works in cross-modal scenarios that demonstrate the improvement of NLP branch expression capabilities, which will improve the precision of visual generation results. Degree helps a lot. If the scale of cross-modal models can be further expanded, the "emergence" of model capabilities may be something worth looking forward to.

Secondly, like the first generation GPT-3, the current multi-modal generation results can often see very good and stunning results when selected, but the generation controllability still has a lot of room for improvement. ChatGPT seems to have improved this problem to a certain extent, and the generated things are more in line with human wishes. Therefore, Zhang Debing pointed out that cross-modal generation may be attempted by referring to many ideas of ChatGPT, such as fine-tuning based on high-quality data, reinforcement learning, etc.

These research results will be applied in Xiaohongshu’s many businesses, including intelligent customer service in e-commerce and other scenarios, a more accurate understanding of user queries and user notes in search scenarios, and intelligent soundtracking of user materials in smart creation scenarios. , copywriting generation, cross-modal conversion and generative creation, etc. In each scenario, the depth and breadth of applications will continue to be enhanced and expanded as model size is compressed and model accuracy continues to improve.

As a UGC community with 200 million monthly active users, Xiaohongshu has created a huge amount of multi-modal data collection with the richness and diversity of community content. A large amount of real data has been accumulated in information retrieval, information recommendation, information understanding, especially in intelligent creation-related technologies, as well as underlying multi-modal learning and unified representation learning. It also provides unique and practical innovations in these fields. A vast landing scene.

Xiaohongshu is still one of the few Internet products that still maintains strong growth momentum. Thanks to its product form that emphasizes both graphic and video content, Xiaohongshu will face challenges in the fields of multi-modality, audio and video, and search and advertising. and create many cutting-edge application problems. This has also attracted a large number of technical talents to join. Many members of the Xiaohongshu technical team have working experience in first-tier manufacturers at home and abroad such as Google, Facebook, and BAT.

These technical challenges will also give technical people the opportunity to fully participate in new fields and even play an important role. In the future, the space for talent growth that Xiaohongshu's technical team can provide will be broader than ever before, and it is also waiting for more outstanding AI technical talents to join.

At the same time, Xiaohongshu also attaches great importance to communication with the industry. "REDtech is coming" is a technology live broadcast column created by Xiaohongshu's technical team for the forefront of the industry. Since the beginning of this year, the Xiaohongshu technical team has conducted in-depth exchanges and dialogues with leaders, experts and scholars in the fields of multi-modality, NLP, machine learning, recommendation algorithms, etc., striving to explore and implement solutions from the dual perspectives of academic research and Xiaohongshu’s practical experience. Discuss valuable technical issues.

Guess you like

Origin blog.csdn.net/REDtech_1024/article/details/130196923