just! OpenAI proposes Superalignment super alignment 20230706

Alpha rabbit research notes included in the collection

#International trends63

#OpenAI6

#AI36

#artificial intelligence45

The author's WeChat Alphatue.

Phenomenon

* This article is about 2000 words

OpenAI requires scientific and technological breakthroughs to guide and control AI systems smarter than us. In order to solve this problem in four years, we are forming a new team, co-led by Ilya Sutskever and Jan Leike, and we will devote 20% of the computing power acquired so far to this work. We are looking for outstanding ML researchers and engineers to join us.

Safety & Alignment

Superintelligence will be the most impactful technology humanity has ever seen, helping us solve many of the world's most important problems. However, the enormous power of a superintelligence could also be very dangerous and could lead to the incapacity or even extinction of humanity.

Here we focus on superintelligence rather than AGI to emphasize a higher level of capability. We have a lot of uncertainty about how quickly the technology will evolve over the next few years, so we choose to tune a more capable system with a more difficult goal.

 It seems so far away now, we believe it may arrive within this decade.

Managing these risks will require, inter alia, new institutions of governance and addressing the alignment of superintelligence:

How can we ensure that smarter-than-human AI systems follow human intent?

Currently, we don't have a solution to guide or control a potentially superintelligent AI and prevent it from going rogue. Our current techniques for tuning AI, such as reinforcement learning from human feedback, rely on human oversight of the AI. But humans will not be able to reliably supervise AI systems that are much smarter than we are.

Other assumptions may also be broken in the future, such as favorable generalization properties during deployment, or our model's inability to successfully detect and break supervision during training.

 Therefore, our current permutation and combination techniques will not scale to superintelligence. We need new scientific and technological breakthroughs.

method

Our goal is to build a roughly human-level automated alignment researcher. We can then scale our efforts through massive computations and iteratively align superintelligence.

To align the first automated alignment researcher, we will need to 1) develop a scalable training method, 2) validate the resulting model, and 3) stress test our entire alignment pipeline:

In order to align the first auto-alignment researcher, one needs

1) Develop a scalable training method

2) Verify the resulting model

and 3) stress test the entire alignment pipeline :

To provide a training signal on tasks that are difficult for humans to evaluate, AI systems can be leveraged to assist in evaluating other AI systems (scalable supervision). Additionally, we want to understand and control how our model generalizes our supervision to tasks that we cannot supervise (generalization).

To verify the consistency of our system, we automatically search for problematic behavior (robustness) and problematic internal structure (automated interpretability).

Finally, we can test our entire pipeline by training models with intentional misalignments and confirm that our technique can detect the worst types of misalignments (adversarial testing).

We anticipate that as we learn more about this problem, our research focus will change substantially, and we may add entirely new areas of research. We are planning to share more about our roadmap in the future.

new team formation

We are assembling a team of top machine learning researchers and engineers to study this problem.

Over the next four years, we will devote 20% of the computing power acquired so far to solving superintelligence docking problems. Our main fundamental research bet is our new superalignment team, but getting this right is critical to our mission, and we expect many teams to contribute, from developing new methods to scaling them up for deployment.

Our goal is to solve the core technical challenges of superintelligence alignment within four years .

While this is an incredibly ambitious goal and we cannot guarantee success, we are optimistic that a focused, coordinated effort can solve this problem:

Addressing the problem involves providing evidence and arguments to convince the machine learning and security community that the problem has been solved. If we fail to have very high confidence in our solution, we hope our findings allow us and the community to plan appropriately.

 There are many ideas that show promise in preliminary experiments, we have more and more useful indicators of progress, and we can investigate many of these problems empirically using today's models.

Ilya Sutskever ( co-founder and chief scientist at OpenAI ) has made this a core research focus of his research and will co-lead the team with Jan Leike (head of Alignment). Joining this team are researchers and engineers from our previous alignment team, as well as researchers from other groups in the company.

We are also looking for excellent new researchers and engineers to join this effort. Superintelligent alignment is fundamentally a machine learning problem, and we think good machine learning experts -- even if they haven't worked on alignment yet -- will be key to solving this problem.

We plan to share the results of this work widely, and see contributing to the alignment and safety of non-open AI models as an important part of our work.

This new team's work complements OpenAI's existing work to improve the safety of current models such as ChatGPT, as well as to understand and mitigate other risks of AI, such as misuse, economic disruption, disinformation, bias and discrimination, Addiction and overdependence etc. While this new team will focus on the machine learning challenge of aligning superintelligent AI systems with human intent, there are related sociotechnical issues that we are actively engaging with interdisciplinary experts to ensure our technology addresses Programs consider wider human and social issues.

References

1.https://openai.com/blog/introducing-superalignment

【read more】

Guess you like

Origin blog.csdn.net/sinat_37574187/article/details/131752038