Nvidia connects GPT-4 to Minecraft, without human intervention, playing games 15 times faster!

Xi Xiaoyao's technology sharing
source | Heart of the machine

Could the game industry be changing?

The general-purpose AI model GPT-4 has entered the game, which is an open world, and played at a high level.

Yesterday, the VOYAGER released by Nvidia brought a little shock to the AI ​​circle.

VOYAGER is the first large-scale model-driven game agent that can learn for life. A well-known AI scholar, Andrej Karpathy, who just returned to OpenAI, read the paper and said: I still remember that in about 2016, in a game like "Minecraft" How hopeless is it to develop an AI agent in such an environment?

Now the style of painting has changed - the correct way is to forget all these, first use the whole network data to train a large language model (LLM) to learn world knowledge, reasoning and tool use (encoding), and then let it go in this way Solve the problem.

Karpathy finally concluded: If I read this "gradient-free" agent method in 2016, I would definitely be shocked.

Now that the experts are done talking, the others are pretty straightforward: It looks like we're one step closer to artificial general intelligence (AGI).

Some people are also imagining the scene of the future game, where the NPC is driven by the large model, and the realm of vigorous and vigorous development of all things is vividly seen:

We know that ChatGPT, which leads the technological trend, is a text-interactive chat robot, and because GPT-4 has upgraded multi-modal capabilities, people often predict that the next step of general AI is to put this large model in the robot, so that it can Interact with the real world.

In the way robots interact with the real/virtual world, advanced large models like GPT-4 unlock a new paradigm: "training" is code execution rather than gradient descent. The "trained model" is the skill code library of VOYAGER's iterative combination, not a matrix of floating point numbers. Now, we are pushing gradient-free architectures to their limits.

In Minecraft, VOYAGER quickly became a seasoned explorer, earning 3.3x more unique items, traveling 2.3x more distance, and unlocking key tech tree milestones 15.3x faster than previous methods .

Nvidia has completely open sourced VOYAGER research:

Paper link:
https://arxiv.org/pdf/2305.16291.pdf

Project homepage:
https://voyager.minedojo.org/

GitHub:
https://github.com/MineDojo/Voyager

Large model research test portal

ChatGPT Portal (wall-free, can be tested directly):

https://yeschat.cn

GPT-4 Portal (free of wall, can be tested directly, in case of browser warning point advanced/continue to visit):

https://gpt4test.com

Research Background

Constructing embodied agents with general capabilities to continuously explore, plan, and develop new skills in an open world is a great challenge in the field of artificial intelligence. Traditional approaches employing reinforcement learning and imitation learning approaches, which operate on primitive behaviors, can be challenging for systematic exploration, interpretability, and generalization.

Recently, large-scale language model (LLM)-based agents have achieved breakthroughs in these areas, exploiting the world knowledge encapsulated in pre-trained LLMs to generate consistent action plans or executable policies. They are applied to embodied tasks like games and robotics, as well as NLP tasks without representation. However, these agents are not lifelong learners and cannot gradually acquire, update, accumulate and transfer knowledge over a long time span.

Unlike most other games studied in AI, Minecraft does not impose a predetermined end goal or fixed storyline, but instead offers a unique playground with endless possibilities. An efficient lifelong learning agent should have similar abilities to human players:

(1) Propose suitable tasks according to its current skill level and world state, for example, if it finds itself in a desert instead of a forest, it will learn to get sand and cacti before striking iron;

(2) Improve skills based on environmental feedback, and store mastered skills in memory for reuse in similar situations in the future (for example, fighting zombies is similar to fighting spiders);

(3) Constantly explore the world and find new tasks in a self-driven manner.

VOYAGER is the first LLM-driven agent embodying lifelong learning, which can drive exploration in Minecraft, master a wide range of skills, and continuously make new discoveries without human intervention.

The researchers used code as the action space, rather than low-level motor instructions, because programs can naturally represent extended and combined actions in time, which is crucial for many long-term tasks in Minecraft.

VOYAGER interacts with black-box LLM (GPT-4) through prompt and context learning. Notably, this approach circumvents the need for model parameter access and explicit gradient-based training or fine-tuning.

Specifically, VOYAGER attempts to solve progressively difficult tasks presented by automatic courses. The class is generated by GPT-4 with the general goal of "discover as many different things as possible". This approach can be viewed as a contextual novelty search. VOYAGER gradually builds up a skill bank by storing the course of action that contributes to the successful solution of a task. Each program is indexed by an embedding of its description, which can be retrieved in similar situations in the future. Complex skills can be synthesized by composing simpler programs, which enables VOYAGER's ability to quickly become "compounded" over time, alleviating the "catastrophic forgetting" in other continuous learning methods.

method

VOYAGER consists of three novel components: (1) an automated curriculum for proposing open-ended exploration goals; (2) a skill library for developing increasingly complex behaviors; and (3) an iterative prompt mechanism for providing Self-controlled generation of executable code.

automatic course

In the open mode, the embodied agent will encounter various target environments with different levels of complexity. Automated Curriculum This component provides many benefits for open-ended exploration, enabling a challenging yet manageable learning process, fostering curiosity-driven intrinsic motivation for agents to learn and explore, and encouraging the development of general and flexible problem-solving Strategy.

The automated curriculum component leverages internet-scale knowledge, providing very strong adaptability and responsiveness by forcing GPT-4 to provide a constant stream of new tasks or challenges. An automatic course maximizes exploration based on the exploration progress and the state of the agent. The class is generated by GPT-4 based on the general goal of "discover as many different things as possible".

Skills library

As automated courses continue to present increasingly complex tasks, VOYAGER needs to have a library of skills on which to base its learning and evolution. Inspired by the generality, interpretability, and ubiquity of the programs, the research team represented each skill with executable code that enabled temporary extensions to accomplish specific tasks proposed by the automated curriculum.

Specifically, the top of the skill library is used to add new skills. Each skill is indexed by an embedding of its description, which can be retrieved in similar situations in the future.

At the bottom of the skill library is the skill search. When an automated course suggests a new task, the skills repository is queried to determine the 5 most relevant skills. Complex skills can be synthesized by writing simpler programs. This approach allows VOYAGER's capabilities to rapidly increase over time and alleviates the "catastrophic forgetting" problem.

Iterative prompt mechanism

The research team introduces an iterative prompt mechanism for self-improvement through three types of feedback, including environmental feedback, execution errors, and self-validation to check the success of tasks.

The image below (left) is an example of environmental feedback: GPT-4 realizes that it needs 2 more planks before making sticks. An example of an execution error is shown in the image below (right), where GPT-4 realizes that it should craft a wooden axe instead of a bush axe, since there are no bush axes in Minecraft.

The image below is an example of self-validation. By feeding GPT-4 the agent's current state and task, GPT-4 acts as a "reviewer" and informs the program whether it completed the task. Furthermore, if the task fails, it "criticizes" the agent and offers suggestions on how to complete the task.

experiment

In experiments, the researchers systematically compared VOYAGER and baselines in terms of exploration performance, skill tree mastery, map coverage, and zero-shot generalization to new tasks in new worlds.

They use OpenAI's gpt-4-0314 and gpt-3.5-turbo-0301 APIs for text, and text-embedding-ada-002 API for text embedding. All temperatures are set to 0, except the automatic curriculum which needs to use temperature = 0.1 to encourage task diversity. The simulation environment is built on top of MineDojo and utilizes Mineflayer's JavaScript APIs for motor control.

The evaluation results are as follows:

Significantly greater ability to explore

The advantage of VOYAGER is reflected in its ability to continuously make new progress (as shown in Figure 1). For example, 63 unique items were found in 160 prompt iterations, which is 3.3 times the number of similar projects. On the other hand, AutoGPT lags significantly in discovering new items, while ReAct and Reflexion struggle to make significant progress.

Tech Tree Mastery

The technology tree in "Minecraft" tests the agent's ability to make and use tool levels. Progression through this tree (wooden tools→stone tools→iron tools→diamond tools) requires agents to acquire systematic and constitutive skills.

In Table 1, scores represent the number of successful trials out of three total runs. The numbers are the average number of prompt iterations over three trials, the fewer iterations the more efficient the method. Compared to the baseline, VOYAGER unlocks wood levels 15.3 times faster (in terms of prompt iterations), stone levels 8.5 times faster, iron levels 6.4 times faster, VOYAGER is the only one that unlocks diamonds in the tech tree grade model.

Extensive map traversal

Compared with the baseline, VOYAGER is able to cover 2.3 times the distance and traverse various terrains, while the baseline agents often find themselves localized, which greatly hinders their ability to discover new knowledge (Figure 7).

Zero-shot generalization to unseen tasks

To evaluate zero-shot generalization, the researchers cleared the agent's library, reset it to a freshly instantiated world, and tested it with unseen tasks. For VOYAGER and AutoGPT, they utilize GPT-4 to decompose the task into a series of sub-goals.

As shown in Table 2 and Figure 8, VOYAGER can continuously solve all tasks, while the baseline cannot solve any tasks within 50 prompt iterations. It is worth noting that the skill base built from lifelong learning not only enhances the performance of VOYAGER, but also brings improvements to AutoGPT. This demonstrates that the skill library is a versatile tool that can be readily adopted by other approaches, effectively serving as a plug-and-play asset to improve performance.

Ablation study

The researchers ablated six design choices (automatic curriculum, skill pool, environmental feedback, execution errors, self-validation, and GPT-4 for code generation) in VOYAGER and studied their impact on exploration performance, the results are shown in Figure 9 shown.

VOYAGER outperformed all alternatives, demonstrating the critical role of each component. Furthermore, GPT-4 significantly outperforms GPT-3.5 in terms of code generation.

Finally, Nvidia researchers also pointed out some limitations and future work directions.

The first is the issue of cost. The GPT-4 API incurs significant costs. It is 15 times more expensive than GPT-3.5. However, VOYAGER needs GPT-4 to achieve a leap in code generation quality that neither GPT-3.5 nor open source LLM can provide.

Second, despite the iterative prompt mechanism, there are cases where the agent gets stuck and fails to generate the correct skills. The automatic course has the flexibility to retry the task at a later time. Occasionally, the self-verification module may fail, for example by failing to recognize a spider string as a successful signal for defeating a spider.

Then there is the "illusion" problem of large models. Occasionally, the automated curriculum will come up with tasks that cannot be completed, such as perhaps asking the agent to make a "bronze sword" or "bronze breastplate" that doesn't exist in the game. Hallucinations can also occur during code generation, e.g. GPT-4 tends to use pebbles as fuel input, which is an ineffective fuel source in games. Additionally, it may call functions that are not in the provided control's raw API, resulting in incorrect code execution. The researchers believe that improvements to the GPT API model and new techniques for fine-tuning the open-source LLM will overcome these limitations in the future.

For more details of the research, please refer to the original paper.

Guess you like

Origin blog.csdn.net/xixiaoyaoww/article/details/130919471