Agent application development based on large model (LLM)

At present, the industry generally believes that applications based on large models are concentrated in two directions: RAG and Agent. No matter which application, it is necessary to design, implement and optimize applications that can fully utilize the potential of large models (LLM). A lot of effort and expertise. As developers begin to create increasingly complex LLM applications, the development process inevitably becomes more complex. The potential design space of this process may be huge and complex. The article "How to Build an App Based on Large Models" gives an exploratory approach. The basic framework for large model application development can basically be applied to RAG and Agent. However, is there anything unique about the development of large-model applications for Agents? Is there any large-model application development framework that focuses on Agent? 

So, what is an Agent?

1. What is Agent?

The Agent here refers to an intelligent body, which can be traced back to Minsky's "Society of Mind". In that book, Minsky's definition of Agent is somewhat abstract - "An individual in society who can obtain a solution to a problem after negotiation is an agent." In the computer field, an agent is an entity that senses its environment through sensors and acts on the environment through actuators. Therefore, an entity can be defined as a mapping from a sensing sequence to an entity's action. It is generally believed that an Agent refers to a computing entity that resides in a certain environment, can continue to function autonomously, and has characteristics such as autonomy, reactivity, sociality, and initiative.

Intelligence is the emergent property of the interaction between Agent and the environment.

1.1 Structure and characteristics of Agent

The general structure of Agent is shown in the figure below:

106bbfb87fb035ecbd3fd45b67ac69fd.png

The main features of Agent are:

● Autonomy: It operates without the direct intervention of humans or other agents, and exercises some control over its own behavior and internal state.

● Social Ability: The ability to interact with other Agents (or humans) through some kind of communication. There are three main types of interactions: Cooperation, Coordination and Negotiation.

● Reactivity: Ability to perceive the environment (which can be the physical world, a user connected via a graphical user interface, a series of other Agents, the Internet, or a combination of all these) and respond to changes in the environment in a timely manner.

●Proactivity (Pro-activeness): not only can respond to the environment, but also can proactively perform behaviors to achieve its goals.

If you try to formally express the Agent, it might look like this:

Agent = platform+ agent program
platform = computing device + sensor+ action
agent program 是 agent function 的真子集

1.2 Agent in the field of large models

In the field of large models, large models replace the rule engine and knowledge base in traditional agents. Agents provide and seek dialogue channels for reasoning, observation, criticism, and verification. Especially when configured with the right prompting and inference settings, a single LLM can display a wide range of capabilities, and dialogue between differently configured Agents can help combine these broad LLM capabilities in a modular and complementary manner.

Developers can easily and quickly create Agents with different roles, for example, use the Agent to write code, execute code, connect with human feedback, verify output, etc. The Agent's backend can also be easily extended to allow more customized behavior by selecting and configuring a subset of the built-in functionality.

2. What is Multi-Agent?

Multi-Agent (multi-agent system) refers to a group system composed of multiple autonomous individuals. Its goal is to communicate and interact with each other through mutual information between individuals.

Generally, Multi-Agent consists of a series of interacting Agents and their corresponding organizational rules and information exchange protocols. Through mutual communication, cooperation, competition, etc., the various internal Agents can complete a large number of tasks that a single Agent cannot complete. The work is complex and is a "system of systems."

2.1 System classification and characteristics of Multi-Agent

Multi-Agent Systems (MAS) can be mainly divided into the following categories:

571aa3d1d3387f73c4aad0ab98a28eb3.png

The main features of the Multi-Agent system are as follows:

  1.  Autonomy. In the Multi-Agent system, each Agent can manage its own behavior and cooperate or compete autonomously.

  2. Fault tolerance. Agents can work together to form a cooperative system to achieve independent or common goals. If some agents fail, other agents will autonomously adapt to the new environment and continue to work, without causing the entire system to fall into a fault state.

  3. Flexibility and scalability. The Multi-Agent system itself adopts a distributed design. The Agent has the characteristics of high cohesion and low coupling, making the system highly scalable.

  4. Collaboration. Multi-Agent system is a distributed system, and agents can cooperate with each other through appropriate strategies to achieve global goals.

2.2 Multi-Agent in the field of large models

Specifically, in application fields based on large models, LLM has been proven to have the ability to solve complex tasks when they are decomposed into simpler subtasks. Multi-Agent communication and collaboration can realize the decomposition and integration of such sub-tasks through the intuitive method of "dialogue".

In order to make agents based on large models suitable for multi-agent conversations, each agent can have a conversation, and they can receive, respond and respond to messages. When configured correctly, an Agent can automatically engage in multiple conversations with other agents, or request human input during certain conversation rounds, thus forming RLHF through human feedback. The conversational Agent design leverages LLM's powerful ability to obtain feedback and make progress through chat, and also allows LLM's functionality to be combined in a modular fashion.

3. Common Agent and Multi-Agent systems based on large models

3.1 Single Agent System

Common single-agent systems based on large models include:

  • AutoGPT: AutoGPT is an open source implementation of an AI agent that attempts to automatically achieve a given goal. It follows the single-Agent paradigm, uses many useful tools to enhance AI models, and does not support Multi-Agent collaboration.

  • ChatGPT+ (code interpreter or plugin): ChatGPT is a conversational AI Agent that can now be used with a code interpreter or plugin. The code interpreter enables ChatGPT to execute code, while the plugin enhances ChatGPT with management tools.

  • LangChain Agent: LangChain is a general framework for developing LLM-based applications. LangChain has various types of agents, ReAct Agent is one of the famous examples. All agents in LangChain follow the single-Agent paradigm and are not inherently designed for communication and collaboration modes.

  • Transformers Agent: Transformers Agent is an experimental natural language API built on the Transformer repository. It consists of a curated set of tools and an Agent for interpreting natural language and using these tools. Similar to AutoGPT, it follows a single-agent paradigm and does not support collaboration between agents.

3.2 Multi-Agent system

Common Multi-Agent systems based on large models include:

  • BabyAGI: BabyAGI is an example of an artificial intelligence task management system implemented in Python scripts. In this implemented system, multiple LLM-based agents are used. For example, there is an Agent for creating new tasks based on the goals and results of the previous task, an Agent for prioritizing the task list, and an Agent for completing tasks/subtasks. As a Multi-Agent system, BabyAGI adopts a static Agent dialogue mode and a predefined Agent communication sequence.

  • CAMEL: CAMEL is an agent communication framework. It demonstrates how to use role-playing to let chat agents communicate with each other to complete tasks. It also records Agent conversations for behavior analysis and capability understanding, and adopts initial prompting technology to achieve autonomous cooperation between agents. However, CAMEL itself does not support the use of tools, such as code execution. Although it is proposed as an infrastructure for multi-agent sessions, it only supports static session mode.

  • Multi-Agent Debate: Multi-Agent Debate attempts to build LLM applications with multi-agent dialogue, is an effective way to encourage divergent thinking in LLM, and improves the factuality and reasoning of LLM. In both works, multiple LLM reasoning instances are constructed as multiple Agents to solve problems argued with the Agent. Each Agent is an LLM reasoning instance without involving any tools or personnel, and the dialogue between Agents needs to follow a predefined sequence.

  • MetaGPT: MetaGPT is an LLM automated software development application based on the Multi-Agent dialogue framework. They assign different roles to various GPTs to collaborate on software development and develop specialized solutions for specific scenarios.

After understanding the basic concepts of Agent and Multi-Agent and common systems, how to develop an Agent application based on a large model? Last month (September 2023), Microsoft proposed an open source framework for Autogen, which provides a valuable reference for developing LLM Agent applications.

4. Multi-Agent-based LLM application development framework: Autogen

AutoGen is a development framework for simplifying the orchestration, optimization, and automation of LLM workflows. It provides customizable and conversational agents that leverage the strongest features of LLM, such as GPT-4, while addressing their limitations by integrating with people and tools and enabling conversations between multiple agents through automated chat.

4.1 Typical examples of Autogen

Autogen uses Multi-Agent sessions to enable complex LLM-based workflows. Typical examples are as follows:

3e08de51dd218d45a68359b5b803c9c0.png

The picture on the left represents a customizable Agent generated based on AutoGen, which can be based on LLM, tools, people, or even their combination. The upper right represents that Agent can solve tasks through dialogue, and the lower right represents that Autogen supports many additional complex dialogue modes.

4.2 General usage of Autogen

Using AutoGen, building a complex Multi-Agent session system boils down to:

  • Define a set of Agents with specialized functions and roles.

  • Define the interaction behavior between agents, for example, what should one agent reply when it receives a message from another agent.

Both steps are modular, making these agents reusable and composable. For example, to build a code-based question and answer system, you can design the agents and their interactions so that such a system can reduce the number of manual interactions required by the application. A workflow for resolving issues in your code is shown below:

58af6e00fbe9c73c7f5abfe2fbb7e37a.png

The commander receives questions from users and coordinates with the writer and saftguard. The writer writes the code and interprets it, the tguard ensures safety, and the commander executes the code. If a problem occurs, the process can be repeated until the problem is resolved.

5. Autogen’s core features: Customizable Agent

Agents in AutoGen have capabilities enabled by LLM, humans, tools, or a mixture of these elements. For example:

  • The use and role of LLM in Agent can be easily configured through advanced reasoning features (automated solution of complex tasks through group chat).

  • Artificial intelligence and supervision can be achieved through agents with different participation levels and modes, for example, using GPT-4 + automated task solving for multiple human users.

  • The Agent has native support for LLM driven code/function execution, i.e. automating solving tasks through code generation, execution and debugging, using the provided tools as functions.

5.1 Assistant Agent

A simple way to use the AutoGen Assistant Agent is to invoke an automated chat between the Assistant Agent and the User Agent. It is easy to build an enhanced version of the ChatGPT + Code Interpreter + plug-in (as shown in the figure below), which has customizable Automation capabilities, which can be used in customized environments and embedded into larger systems.

3e28671da924e44f0a6775691ef8b16e.png

In the picture above, Assistant Agent plays the role of an artificial intelligence assistant, such as Bing Chat. The User Agent Agent plays the role of the user and simulates the user's actions, such as code execution. AutoGen automates the chat between two agents while allowing manual feedback or intervention. User Agents interact seamlessly with humans and use tools when appropriate.

5.2 Multi-Agent session

Agent session-centric design has many benefits, including:

  • Handle ambiguity, feedback, progress, and collaboration naturally.

  • Enable efficient coding-related tasks such as using tools through back-and-forth troubleshooting.

  • Allow users to seamlessly opt in or out via the chat agent.

  • Achieve collective goals through the collaboration of multiple experts.

AutoGen supports automatic chat and diverse communication modes, making it easy to orchestrate complex, dynamic workflows and experimental versatility. In the figure below, a special Agent called "GroupChatManager" is used to support communication between multiple Agents. group chat.

4481300a2f4412c013d63650dd6724aa.png

GroupChatManager is a special agent that repeats the following three steps: selects a speaker (Bob in this case), asks the speaker to respond, and broadcasts the selected speaker's information to all other agents.

In summary, AutoGen is designed as a general-purpose infrastructure for building LLM applications. Its conversation mode supports almost all mode types of existing LLM systems. In "static" mode, the Agent's topology remains unchanged regardless of the input. AutoGen allows flexible conversation modes, including static and dynamic modes that can be customized for different application needs. Its Multi-Agent system can execute the code generated by LLM, allowing human participation in the system execution process.

6. Autogen usage examples

Autogen provides many interesting examples on github, here https://github.com/microsoft/autogen/blob/main/notebook/agentchathumanfeedback.ipynb Take as an example, a brief introduction on how to use Autogen to generate application instances based on Multi-Agent sessions - code generation, execution, debugging and manual feedback task resolution.

6.1 Environment settings

AutoGen requires Python version greater than 3.8, installation is as follows:

pip install pyautogen

With just a few lines of code, you can quickly implement a powerful experience:

import autogen
config_list = autogen.config_list_from_json("OAI_CONFIG_LIST")

The reference files for config_list are as follows:

config_list = [
{
'model': 'gpt-4',
'api_key': '<your OpenAI API key here>',
}, # OpenAI API endpoint for gpt-4
{
'model': 'gpt-4',
'api_key': '<your first Azure OpenAI API key here>',
'api_base': '<your first Azure OpenAI API base here>',
'api_type': 'azure',
'api_version': '2023-06-01-preview',
}, # Azure OpenAI API endpoint for gpt-4
{
'model': 'gpt-4',
'api_key': '<your second Azure OpenAI API key here>',
'api_base': '<your second Azure OpenAI API base here>',
'api_type': 'azure',
'api_version': '2023-06-01-preview',
}, # another Azure OpenAI API endpoint for gpt-4
{
'model': 'gpt-3.5-turbo',
'api_key': '<your OpenAI API key here>',
}, # OpenAI API endpoint for gpt-3.5-turbo
{
'model': 'gpt-3.5-turbo',
'api_key': '<your first Azure OpenAI API key here>',
'api_base': '<your first Azure OpenAI API base here>',
'api_type': 'azure',
'api_version': '2023-06-01-preview',
}, # Azure OpenAI API endpoint for gpt-3.5-turbo
{
'model': 'gpt-3.5-turbo',
'api_key': '<your second Azure OpenAI API key here>',
'api_base': '<your second Azure OpenAI API base here>',
'api_type': 'azure',
'api_version': '2023-06-01-preview',
}, # another Azure OpenAI API endpoint for gpt-3.5-turbo
]

6.2 Creation of Assistant Agent and User Agent Agent

# create an AssistantAgent instance named "assistant"
assistant = autogen.AssistantAgent(
    name="assistant",
    llm_config={
        "seed": 41,
        "config_list": config_list,
    }
)
# create a UserProxyAgent instance named "user_proxy"
user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    human_input_mode="ALWAYS",
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
)

# the purpose of the following line is to log the conversation history
autogen.ChatCompletion.start_logging()

6.3 Perform a task

Call the userAgent's initiate_chat() method to start the conversation. When running the code below, after receiving a message from the Assistant Agent, the user will be prompted to provide feedback. If the user does not provide any feedback (press Enter key directly), the user agent will try to execute the code suggested by the assistant agent on behalf of the heart, and terminate when the assistant agent sends the "terminate" signal at the end of the message.

math_problem_to_solve = """
Find $a + b + c$, given that $x+y \\neq -1$ and 
\\begin{align}
    ax + by + c & = x + 7,\\
    a + bx + cy & = 2x + 6y,\\
    ay + b + cx & = 4x + y.
\\end{align}.
"""

# the assistant receives a message from the user, which contains the task description
user_proxy.initiate_chat(assistant, message=math_problem_to_solve)

Users can provide feedback at every step. The execution results and error messages are returned to the assistant, and the assistant agent can modify the code based on the feedback. Finally, the task is completed and the assistant agent sends a "TERMINATE" signal. The user eventually skips the feedback and the conversation ends.

After the conversation ends, the conversation log between the two Agents can be saved through autogen.ChatCompletion.logged_history.

json.dump(autogen.ChatCompletion.logged_history, open("conversations.json", "w"), indent=2)

This example demonstrates how to use AssistantAgent and UserProxyAgent to solve a challenging mathematical problem. The AssistantAgent here is an LLM-based Agent that can write Python code to perform a given task by the user. UserProxyAgent is another Agent that acts as a proxy for users to execute code written by AssistantAgent. By setting humaninputmode correctly, UserProxyAgent can also prompt users to provide feedback to AssistantAgent. For example, when humaninputmode is set to "ALWAYS", UserProxyAgent will always prompt the user for feedback. When user feedback is provided, the UserProxyAgent will pass the feedback directly to the AssistantAgent. When no user feedback is provided, UserProxyAgent will execute the code written by AssistantAgent and return the execution result (success or failure and corresponding output) to AssistantAgent.

7. Summary

Agent is an important program form that actively interacts with large models, while Multi-Agent is a system mechanism for multiple Agents to use large models to complete complex tasks. Microsoft's AutoGen is an open source, community-driven, Multi-Agent session-oriented project that is still under active development. AutoGen aims to provide developers with an efficient and easy-to-use framework to build next-generation applications, and has demonstrated excellent opportunities for building creative applications, providing a broad scope for innovation.

[Reference materials and related reading]

Guess you like

Origin blog.csdn.net/wireless_com/article/details/133849992