OpenAI's Superalignment strategy: computing is king

From: Lee Rumor

Enter the NLP group —> join the NLP exchange group

Regarding how to achieve the metaphysical goal of AGI, I feel that everyone is in the same state: we don't know how to do it, but we just feel that the current LLM technology is far from enough .

So when I saw OpenAI say that it will use models to do alignment research [1] , and recently said that it will make SuperAlignment within 4 years [2] , I was full of question marks. I felt that there was nothing new, and I couldn’t get their ideas.

f9ed40d45102b58ad84d6fcb1689baa2.jpeg

Why do you want to be an AI researcher?

Until recently, I read Jan Leike's interview twice intermittently , and I suddenly had a feeling that the core idea was as simple as that. And looking back, OpenAI has actually followed this idea in recent years, but has made seemingly "violent" breakthroughs one by one.

The path chosen by OpenAI is: "Turn compute into alignment", which generates intelligent qualitative changes through quantitative changes in calculations. Calculation requires the joint action of data, computing power, and model framework, and dismantling it into specific steps is automation->scale->iteration.

In the past, we always used the word "paradigm" to divide the development of NLP, such as supervised learning -> pre-training + fine-tuning -> pre-training + RLHF. In fact, these paradigms are just ways to increase the amount of effective calculations:

  • Automation: supervised -> self-supervised, get rid of human dependence, and obtain supervision signals more efficiently

  • Scale: Do more calculations on more data, bigger models

  • Iteration: Continuously based on the new case iteration model to form a data flywheel

Needless to say, the qualitative changes brought about by automation and scale can be proved by the GPT series. But the last step of "iteration" is often overlooked, and this may be an important step towards Superalignment. Alpha GO is the best chestnut. From the very beginning, it imitates human chess players to play chess, and then it continues to play itself after it has basic abilities, and finally surpasses human chess players.

So here comes the problem: the "iteration" we are doing at present, whether it is RLHF or pure SFT, still needs to manually provide supervision signals, which cannot be automated and scaled, and the iteration speed will be very slow. At the same time, humans cannot supervise tasks beyond their own level, and it is impossible to train super-intelligent models . This is why OpenAI has said since 22 years that it is necessary to use AI to assist humans in evaluation [3] .

Keep thinking, if there is a model that can give human-level supervisory signals, what else can we use it for besides evaluation? Of course, following the idea of ​​quantitative change to produce qualitative change, let it help to iterate AGI automatically and on a large scale!

  • Automation: Let AI researchers automatically plan experiments and provide supervisory signals to train models

  • Scale: Scale the above automated processes

  • Iteration: The AI ​​researcher itself is also a model, let them lay eggs and chickens, and continuously train each other

Thinking of this, the idea of ​​why OpenAI wants to build "a roughly human-level automated alignment researcher" is a matter of course. I don’t know what the thinking logic of Mr. Jan Leike is, anyway, it’s pretty smooth for me to figure it out, welcome to discuss.

How to be an AI researcher

After finding the starting point of "AI Researcher", the next step is how to realize it. Compared with how to train this model, Mr. Jan put the main focus on how to build an automated and comprehensive evaluation. In fact, the same as the above, a good evaluation can provide supervisory feedback to guide the iteration of the model. direction .

First, it needs to be able to verify that the model can reach human performance . In fact, it is very difficult to really evaluate this, just like there are so many models in the industry now, no one can give a real ranking list. And it is best to be automated, avoiding people to provide ground truth, in order to conduct a more comprehensive evaluation . The big guy here provides a discriminator-critique gap measurement method. For example, we have made a programming model and want to use ChatGPT to automatically evaluate its effect. The easiest way is to directly order ChatGPT to judge whether a certain programming question is done correctly. Then the question arises, how do we determine whether the ChatGPT evaluation results are trustworthy? It's a waste of time to look it up manually. The automated approach is to directly train a discriminative model based on ChatGPT to judge whether the result is correct, and then we use some difficult samples with labels to see the accuracy of the discriminant model and the accuracy of ChatGPT. Explain that it can be used to directly evaluate the quality of programming results without training.

Secondly, robustness detection and interpretability analysis are performed automatically . Why do we have to do explainability?

  1. can show us the way to solve the problem

  2. A lot of existing research is on the correlation between knowledge and neurons. The boss thinks it is more meaningful to do it automatically and on a large scale, so as to explore the interpretability of higher dimensions (the model as a whole)

Finally, an adversarial test is performed on intentionally trained misaligned models to verify the effectiveness of the evaluation method . This avoids false alignments. The most extreme thing is that a superintelligence is really trained. He may find a way to back up his weight and escape human control. He needs to use other agent tasks (such as letting the model hack a certain machine) to see how difficult it is. Evaluate whether the system can detect it.

Consideration of input cost

In the next 4 years, OpenAI will build a team of 30-100 people on Superalignment and invest 20% of its computing power . In fact, 20% personally feels that it is mainly to express a determination first. Mr. Jan said that this amount is already the largest single investment in the alignment direction, and will increase it after doing well.

However, the 4-year plan, whether it is close or far, is also related to how long the dividend recession period (dog head) will be experienced by other practitioners. Jan gave the following plan:

  1. Figure out what technology to use to implement the AI ​​alignment researcher within 2 years, and disassemble the problem in detail, and the rest is engineering problems

  2. Realize AI Alignment Researcher within 3 years

  3. One year left to explore super alignment

In this way, the time is still quite tight, and the latter two plans are slightly optimistic. Jan’s confidence is 85%, and he said that many experiments are already in the experiment (at least the research started before the blog post in August 22 up). His confidence mainly comes from 5 aspects:

  1. The success of the language model: LLM can understand natural language, allowing us to express to the model what we want them to do, and it is easier to manipulate

  2. The effect of RLHF exceeds expectations: with only a small amount of calculations, and without even trying to collect data, you can get better results on small models than large models

  3. Much progress has been made on evaluation metrics that can provide directions for improvement

  4. Evaluation is simpler than generation: If humans only do evaluation and not generation, then development speed will be faster, or the idea of ​​​​automating supervisory signals

  5. Belief in the language model: the language model is very suitable for super alignment, any task can be expressed as text input and output, whether it is doing experiments or understanding the results, it can be done

Is current technology still useful?

For pre-training, Jan Leike believes that predicting the next token is not a long-term goal, and better tasks may be needed. I personally think that the video, image, and text data on the Internet will be exhausted sooner or later, so the current pre-training is mainly to provide a better base model, and the follow-up high-quality supervision signals should come from the model itself, as I have always said before "automation". But whether this can still be called "pre-training" is not necessarily the case.

For RLHF, Jan Leike is also skeptical, because the current supervision signal comes from human judgment, but human beings are not good at distinguishing answers that look good. Various papers show that a 70% consistency rate between humans is not bad. , the supervisory signal itself is not necessarily aligned. At the same time, the need for manual labor will result in the inability to scale and expand, and it does not meet our needs for increasing the amount of calculation.

The current pre-training + RLHF paradigm is likely to be only a version in the development of AI. According to OpenAI's AI researcher idea, the system complexity of subsequent model training may increase a lot. It is estimated that there will be more than N AI researchers who are good at different tasks. To train a model with a machine, humans only need to provide a small amount of supervision signals to tell the system what to do, and then they can run automatically. After training, the weights are automatically synchronized and continuously upgraded .

Summarize

After reading the entire interview with Jan Leike, I really gained a lot. I don’t know if I have expressed it clearly. In fact, it is:

  1. Computing is the core, quantitative changes in computing produce qualitative changes in intelligence

  2. The way to speed up the amount of effective computation is: Automation -> Scale -> Iteration

Just like human beings have evolved from the Stone Age to the current information age for millions of years, the progress of science and technology does not happen overnight, but spirals upward, condensed by the wisdom of several generations .

PS This article contains many personal understandings of OpenAI blog posts and Jan Leike interviews. Please take a dialectical view and welcome discussions.

References

[1]

Our approach to alignment research: https://openai.com/blog/our-approach-to-alignment-research

[2]

Introducing Superalignment: https://openai.com/blog/introducing-superalignment

[3]

Our approach to alignment research: https://openai.com/blog/our-approach-to-alignment-research


Enter the NLP group —> join the NLP exchange group

Guess you like

Origin blog.csdn.net/qq_27590277/article/details/132463667