Agent: Understand the LLM Agent architecture in one article, and explain in detail the functions of Profile, Memory, Planning, and Action modules.

Hara 创 書  AI speed view 2023-10-07 13:30

included in collection

#AI paper interpretation 3

##Agent5

In the field of artificial intelligence, people’s expectations for Agents are growing day by day. Whenever a new open source tool or product based on Agent appears, it can trigger heated discussions, such as the previous AutoGPT. For friends who are interested in Agent, I recommend a paper, which comprehensively introduces the architecture of Agent and is of great value for understanding the overall situation of Agent.

https://browse.arxiv.org/pdf/2308.11432.pdf

This paper explains in detail the concept, development history and recent research hotspots of Agent. In addition to these basic knowledge, I think the most valuable part is that it summarizes the architecture of Agent based on large language model (LLM), allowing us to design our own Agent according to a certain standard paradigm.

This article of mine mainly explains the construction strategy of Agent based on LLM from two key aspects: designing the Agent architecture to better utilize the capabilities of LLM, and how to give the Agent the ability to complete different tasks.

In terms of Agent architecture design, this paper proposes a unified framework, including Profile module, Memory module, Planning module and Action module.

Profile module:

Define and manage the characteristics and behaviors of Agent roles. It contains a series of parameters and rules that describe various attributes of the Agent, such as roles, goals, abilities, knowledge, and behavior. These properties determine how the agent interacts with the environment, how it understands and responds to tasks, and how it makes decisions and plans. This module proposes three Agent role generation methods, including LLM generation method, data set alignment method and combination method.

1. LLM generation method: Use a large language model to automatically generate the personal characteristics of the agent, such as age, gender, personal preferences and other background information. The specific method is: first set the composition rules of agents and clarify the attributes that agents in the target population should have; then specify several manually created seed configuration files as examples; and finally use language models to generate a large number of agent configuration files. This approach can quickly generate configuration files in batches, but the resulting agents may lack detail due to the lack of precise control.

2. Data set alignment method: Obtaining agent profile information from real-world population data sets, such as extracting census data and organizing it into natural language descriptions. This can make the agent behavior more realistic and credible and accurately reflect the attribute distribution of the real population. But it requires reliable large-scale data set support.

3. Combination method: Use real data sets to generate a part of key agents to ensure that they reflect the laws of the real world; then use the LLM generation method to supplement a large number of other agents to expand the number of agents. This not only ensures the authenticity of the agents, but also achieves a sufficient number of agents, allowing the system to simulate more complex social interactions. Careful configuration file design is the basis for building an effective proxy system.

Memory module:

It plays an important role in the Agent system, which stores and organizes information obtained from the environment to guide future actions.

Structurally, a memory module usually contains two parts: short-term memory and long-term memory. Short-term memory temporarily stores recent perceptions, and long-term memory stores important information for retrieval at any time.

In terms of format, memory information can be expressed in natural language or encoded into vector embeddings to improve retrieval efficiency. You can also use database storage, or organize it into a structured list to represent memory semantics.

In operation, it mainly interacts with the environment through three mechanisms: memory reading, writing and reflection. Read and extract relevant information to guide actions, write and store important information, reflect and summarize insights to improve the level of abstraction.

Planning module:

The main task is to help the Agent decompose complex tasks into more manageable sub-tasks and formulate effective strategies. It is roughly divided into two types, one is a plan that does not rely on feedback, and the other is a plan based on feedback.

Feedback-independent plans do not refer to post-task feedback during the formulation process and have several common strategies. For example, single-path reasoning generates plans step by step in a cascading manner. In addition, there is multi-path reasoning, which generates multiple alternative planning paths to form a tree or graph-like structure. Of course, we can also use an external planner to quickly search to find the optimal plan.

Feedback-based planning, which adjusts the plan based on feedback after task execution, is more suitable for situations where long-term planning is required. The source of feedback may come from objective feedback of task execution results, feedback based on human subjective judgment, or even feedback provided by an auxiliary model.

Action module:

The responsibility is to transform abstract decisions into concrete actions. It is like a bridge that connects the Agent's internal world and the external environment. When performing a task, consider the goal of the action, how it was generated, its scope of application, and its likely impact.

Ideally actions should be purposeful, such as completing a specific task, communicating with other agents, or exploring the environment. Actions can be generated by consulting past memory experiences or by following a preset plan. The scope of actions can not only be expanded by leveraging external tools such as APIs and knowledge bases, but also requires leveraging the inherent capabilities of large language models (LLM), such as planning, dialogue, and understanding common sense.

The architecture is like the hardware of a PC, but relying solely on architectural design is not enough. We also need to give the Agent the ability to complete different tasks. These are regarded as "software" resources. Several methods are proposed in the paper, including model fine-tuning, prompt engineering, and mechanical engineering. Among them, prompt engineering is probably the most common form. The prompt word engineer we often hear is the role in this context.

  1. Model fine-tuning. Use specific task data to fine-tune the model and improve related capabilities. Data can come from human annotations, LLM generation or collected from real applications. This can make Agent behavior more consistent with human values.

  2. Prompt works. Instill the required capabilities into LLM through natural language descriptions, and then use the descriptions as prompts to guide Agent operations. This allows the Agent to quickly obtain specified software capabilities.

  3. Mechanical Engineering. Mainly covered:

  • Trial and error method: Agent performs operations first and adjusts actions based on the effects. Optimize step by step.

  • Crowdsourcing method: Integrate the insights of multiple agents to form an updated collective response.

  • Experience accumulation method: Agent accumulates experience through continuous exploration and gradually improves software capabilities.

  • Self-driven method: Agent sets goals independently and continuously explores the environment, and finally acquires software capabilities.

The design and construction strategy of LLM-based Agent is a complex and challenging task. With the advancement of technology, I believe that more excellent AI applications will be produced through Agents in the future, and ordinary users can also create their own Agents through open source projects and become super individuals in the AI ​​era. I hope everyone can take action as soon as possible, reserve more knowledge, and use or make their own Agent as soon as possible when the technology matures.

AI quick overview

49 pieces of original content

No public

Related Reading:

Practical combat: How to use AI Agent to implement ChatGPT process writing and double the production capacity

PHP prompt word skills, all-round fine-tuning of ChatGPT complex task generation effect

Bombardment of LLM's "Reversal of the Curse" paper, what useful information does it have for ChatGPT users?

Use ChatGPT and these 10 frameworks to write all self-media long articles

Guess you like

Origin blog.csdn.net/sinat_37574187/article/details/134237075