AI, come and feel the fear of being dominated by the "break up kitchen"!

Fish sheep from the bottom of the recessed non-Temple
qubit reports | Public number QbitAI

Friends, have you ever felt the fear of being broken up in the kitchen, ah no, "Overcooked" dominates?

In fact, let alone you, the AIs will be defeated every minute if the terrain is still complicated and requires the cooperation of multiple people, cutting vegetables, serving the pot, and delivering them.

No, researchers from the University of Nottingham, UC Berkeley, and Microsoft Research have now proposed that the deep reinforcement learning model that does not play "Boiled" is not a good collaborative AI.

They also found that most of the current deep RL models can't get a score of more than 65% in "Boiled" .

To this end, they wrote a special paper.

Use the simplified version of "Boiled" for benchmarking

If you want to apply deep reinforcement learning models in the real world to achieve collaboration between AI and humans, a big challenge at present is whether this type of system can encounter situations and untrained behaviors that have not been seen in the development process. Keep it robust.

How to evaluate the robustness of the model is also a difficult point that plagues academic circles.

I don't know if the mischief caused by breaking up the kitchen inspired them. Researchers believe that "Boiled" can successfully test out potential edge cases within the range that the system can handle.

For example, in the game, the system must deal with such a scenario: the plate is accidentally dropped on the counter, and the partner stays in place because of thinking or temporarily leaving...

Therefore, they designed a simplified version of the unit test based on the environment of "Boiled".

Mainly divided into three categories:

State robustness unit test , the success criterion at this time does not depend on the state of the partner. As shown in (a) above, the green-hat chef has already received a plate, so no matter what decision the green-hat chef makes next, the blue-hat chef only needs to take an onion to the left.

The robustness unit test of the agent . At this time, the status of the partner will affect the result, and the robustness of the agent needs to be measured. As shown in the picture (b) above, there is only one passage. The chef in the green hat wants to deliver the soup, and the chef in the blue hat has to get out of the way.

Agent & memory robustness unit test . As shown in the above picture (c), the green-hat chef is not moving. Out of the state of leaving, the blue-hat chef should take the plate and deliver the soup by himself. This state needs to be tested in conjunction with historical records.

The researchers said that this test kit based on "Boiled" can provide information that cannot be obtained by simply considering the verification rewards, so it can be used as a benchmark indicator for judging artificial intelligence collaboration capabilities in the future.

Portal

Paper address:
https://arxiv.org/abs/2101.05507

Code address:
https://github.com/HumanCompatibleAI/human_ai_robustness

Reference link:
https://venturebeat.com/2021/01/15/researchers-propose-using-the-game-overcooked-to-benchmark-collaborative-ai-systems/

Ends  -

This article is the original content of the NetEase News•NetEase Featured Content Incentive Program account [qubit]. Unauthorized reprinting is prohibited.

Join the AI ​​community and expand your network in the AI ​​industry

Qubit "AI Community" is recruiting! AI practitioners and friends who are concerned about the AI ​​industry are welcome to scan the code to join, and pay attention to the development of the artificial intelligence industry & technological progress with 50,000+ friends :

Qubit  QbitAI · headlines on the signing of

վ'ᴗ' ի Track new trends in AI technology and products

One-click three consecutive "Share", "Like" and "Looking"

The frontiers of science and technology are seeing each other~

Guess you like

Origin blog.csdn.net/QbitAI/article/details/112791032