just! OpenAI announced that it has invested heavily in the establishment of a "super alignment" team! align with human intent

Author | IQ has dropped, ZenMoore

Ever since AlphaGo handily defeated the human world champion Lee Sedol in a Go duel, the global gaze on AI has shifted from shock to awe. This historic moment not only revolutionized our understanding of machine learning, but also planted a seed of future possibilities in our hearts. In this silent "brain" competition, AlphaGo, with its impeccable strategy and deep budget capabilities, once again proved the unique advantages and potential of AI in dealing with complex problems. In the sci-fi movie "Terminator", the AI ​​system Skynet's IQ far surpasses that of humans, and its logic and thinking are extremely meticulous, but in the end it runs counter to human interests and makes a decision to destroy the world. First impressions of "intelligent" AI.

Recently, OpenAI is also preparing for the new development of AI in the future. They plan to reorganize the team, led by its chief scientist and company co-founder Ilya Sutskever, to explore new ways to guide and manipulate "superintelligent" AI systems.

Blog Title :
OpenAI - Introducing Superalignment

Didi!
Hello, GPT4!


News Quick Facts

In a blog published by OpenAI recently, Ilya Sutskever and Jan Leike, the leader of OpenAI's alignment team, foresee that within the next decade, there may be AI systems with IQs that exceed humans. If such an AI system does appear in the end, it may not be consistent with human interests. Therefore, Sutskever and Leike emphasize the need to study how to control and limit it.

"Currently, we have no precise way to guide or control a potentially superintelligent AI in case it becomes uncontrollable," they wrote in the article. Existing AI-oriented techniques, such as reinforcement learning using human feedback, rely on Human supervision. However, humans may not be able to effectively supervise AI systems that are much smarter than us.”

In order to make greater breakthroughs in the field of "superintelligent alignment", OpenAI's "Superalignment" team will receive 20% of the company's current computing power. Scientists and engineers from OpenAI's previously aligned division, as well as researchers from other organizations at the company, will work together over the next four years to solve the core technical challenges of controlling superintelligent AI.

Their strategy is to build what they call a "human-level automated alignment researcher." The high-level goal is to use human feedback to train AI to assist in the evaluation of other AI systems, so that initial work can be extended with large-scale computing resources and iteratively aligned with superintelligence. "Alignment research" here means ensuring that AI systems achieve their intended goals, or avoid straying from them.

The hypothesis put forward by OpenAI is that AI may be more effective than humans in alignment research.

Leike and colleagues John Schulman and Jeffrey Wu suggested in a previous blog, "As we advance in this field, our AI systems can take over more and more of our alignment work, eventually conceiving, implementing, researching, and developing Existing alignment techniques more advanced alignment techniques. They will collaborate with humans to ensure that their successors are better aligned with humans...Human researchers will focus more on reviewing alignment studies done by AI systems rather than Do the research yourself."

method

openAI pointed out in the blog that for the first "automatic alignment researcher" to align, we need:

  1. Develop scalable training methods;

  2. Efficiently validate the generated model;

  3. Strict stress testing of the overall alignment process.

  • When dealing with tasks that are difficult for humans to evaluate, we can use AI systems to assist in evaluating other AI systems (known as scalable supervision) to obtain effective training signals. At the same time, we are also working on research and understanding how to make this model apply supervision to tasks that cannot be supervised manually to study its generalization ability.

  • We measure the accuracy of the system alignment and automatically detect any possible problematic behavior (demonstrating its robust stability) and pinpoint any possible underlying internal problems (as part of the automated interpretation capabilities).

  • Finally, we can consciously train misaligned models and test adversarially to confirm that the technique is able to detect the worst misalignments. In this way, the entire process can be effectively tested.

With the in-depth understanding of this issue, the research focus may undergo major adjustments, and even new research areas will be added. OpenAI plans to share more about the progress and plans for this research in the future.

summary

Of course, no method can be guaranteed to be completely error-free. In their paper, Leike, Schulman, and Wu also acknowledge that OpenAI has many limitations. Using an AI system for evaluation could amplify that AI's inconsistencies, biases or vulnerabilities, they said. And the hardest part of the alignment problem may not be related to engineering, it is a multi-domain problem.

But both Sutskever and Leike think the attempt is worthwhile.

They state: "Superintelligent alignment is fundamentally a machine learning problem, and we believe that even good machine learning experts who have not worked on alignment problems will be critical to its solution. We plan to share the results of this process widely. , and feel that contributing to the alignment and safety of non-OpenAI models is an important part of our work."

However, this research is destined to have a long way to go. Engineering skills are of course very important when designing and implementing AI systems. However, the so-called "alignment problem" is primarily concerned with aligning AI's goals with human goals, values, and ethics. This is a question that mainly involves the fields of morality, ethics, psychology and sociology.

The difficulty in understanding and addressing this question is that human goals, values, and ethics are deeply embedded in culture, history, experience, and thought. These factors are of great complexity and variety, making them difficult to define or quantify clearly.

In addition, even if we manage to define a relatively clear and certain goal, we may encounter the problem of "drift". That is, as time goes by and the environment changes, human goals and values ​​may change, and AI systems need to be able to adapt and update following this change. This is also a complex problem involving fields such as machine learning, reinforcement learning, and dynamical systems.

Finally, even if an AI system is carefully designed and tuned, there is no guarantee that its behavior and outcomes will always be fully aligned with human goals and values. Because in the real world, unexpected situations and results often occur. Therefore, solving the alignment problem needs to involve fields far beyond engineering technology, requiring multidisciplinary knowledge and understanding, as well as deeper thinking and discussion.

Guess you like

Origin blog.csdn.net/xixiaoyaoww/article/details/131582939