AutoGPT的架构及工作流程

最近我发现自己完全着迷于AutoGPT 的试验，许多其他人也是如此。将 AutoGPT 用作黑匣子，我开始好奇它是如何在引擎盖下工作的。值得庆幸的是代码是开源的，所以我决定看一看。

以下是我对 AutoGPT 架构的笔记。希望这对那些对 AutoGPT 的工作原理感到好奇的人有所帮助。此外，AutoGPT 可以作为那些正在构建自己的代理 AI 系统的人的参考设计。

注意：我分析了一周前下载的AutoGPT v0.2.1的代码。以下信息反映了 AutoGPT 0.2.1。在撰写本文时，AutoGPT v0.2.2 已经发布。感谢社区正在取得的令人难以置信的进步！

架构

工作流程

用户（人类）定义 AI 代理的名称，并指定最多 5 个目标，例如 AutoGPT 的用户将在他们的终端中看到以下内容（完整示例在附录“初始用户输入的终端消息示例”下）
```
Welcome to Auto-GPT!  Enter the name of your AI and its role below.
...
Enter up to 5 goals for your AI:
...
Goal 1: ...
Goal 2: ...
```
根据用户的设置，生成初始提示并将其发送到 ChatGPT API。提示包含用户的设置和 ChatGPT 的总体说明。总体说明包括所有可用命令、以 json 格式输出结果的说明等。有关初始提示的示例，请参阅附录“初始提示示例”。
ChatGPT 返回一个 json 字符串（理想情况下），其中包括它的想法、推理、计划和批评。json 还包括下一个要执行的命令及其参数。有关 ChatGPT 返回的 json 字符串的示例，请参阅附录“ ChatGPT 返回的示例 json 字符串”。
该命令是从 ChatGPT 的响应中提取和解析的。如果发出关闭/task_complete命令，则系统关闭。否则，适当的命令执行器使用给定的参数执行命令。
执行的命令返回一个字符串值。例如，谷歌搜索命令会返回搜索结果，browse_website命令会返回抓取网站内容的摘要，write_to_file会返回写入文件的状态等。
ChatGPT 输出 (4) 和命令返回字符串 (5) 组合在一起，添加到内存中
(6) 中的上下文被添加到短期记忆中，仅存储为文本。这可以使用队列/FIFO 数据结构来实现。在 AutoGPT 0.2.1 中，存储了完整的消息历史记录，但仅选择前 9 条 ChatGPT 消息/命令返回字符串作为短期记忆。
(6) 中的上下文也被添加到长期记忆中。一般的想法是我们想要一组对(vector, text)，并且能够执行KNN /approximate-KNN 搜索以从给定查询中找到前 K 个最相似的项目。为了获取文本嵌入/向量，我们使用 OpenAI 的 ada-002 嵌入 API。为了存储(vector, text)对，我们可以使用本地内存（例如FAISS），甚至可以使用像 Pinecone 这样的可扩展矢量数据库。AutoGPT 0.2.1 支持 Pinecone、本地数据存储等。我使用了本地存储选项，它以纯文本格式将嵌入向量写入磁盘。
给定来自短期记忆 (7) 的最新上下文，查询来自 (8) 的长期记忆以获得前 K 个最相关的记忆片段（对于 AutoGPT 0.2.1，K=10 ）。top-K 最相关的记忆被添加到提示中，{relevant memory}在图中。有关包含记忆的示例提示，请参阅附录“带有记忆的示例提示”。回忆添加在“这让你想起过去的事件”下。
构建了一个新的提示，使用与初始提示 (2) 相同的指令、来自 (9) 的相关记忆，以及末尾的指令“GENERATE NEXT COMMAND JSON”（参见附录“带有记忆的示例提示”） . 这个新提示用于调用 ChatGPT，重复步骤（3）到（10）直到任务完成，即 ChatGPT 发出task_complete/ 关闭命令。

命令

代理人工智能的一个迷人且非常强大的方面是它发出和执行命令的能力。在 AutoGPT 中，LLM 系统 (ChatGPT) 通过提示中的以下文本了解可用命令及其功能：

Commands:
1. Google Search: "google", args: "input": "<search>"
2. Browse Website: "browse_website", args: "url": "<url>", "question": "<what_you_want_to_find_on_website>"
3. Start GPT Agent: "start_agent", args: "name": "<name>", "task": "<short_task_desc>", "prompt": "<prompt>"
4. Message GPT Agent: "message_agent", args: "key": "<key>", "message": "<message>"
5. List GPT Agents: "list_agents", args: 
6. Delete GPT Agent: "delete_agent", args: "key": "<key>"
7. Clone Repository: "clone_repository", args: "repository_url": "<url>", "clone_path": "<directory>"
8. Write to file: "write_to_file", args: "file": "<file>", "text": "<text>"
9. Read file: "read_file", args: "file": "<file>"
10. Append to file: "append_to_file", args: "file": "<file>", "text": "<text>"
11. Delete file: "delete_file", args: "file": "<file>"
12. Search Files: "search_files", args: "directory": "<directory>"
13. Evaluate Code: "evaluate_code", args: "code": "<full_code_string>"
14. Get Improved Code: "improve_code", args: "suggestions": "<list_of_suggestions>", "code": "<full_code_string>"
15. Write Tests: "write_tests", args: "code": "<full_code_string>", "focus": "<list_of_focus_areas>"
16. Execute Python File: "execute_python_file", args: "file": "<file>"
17. Generate Image: "generate_image", args: "prompt": "<prompt>"
18. Send Tweet: "send_tweet", args: "text": "<text>"
19. Convert Audio to text: "read_audio_from_file", args: "file": "<file>"
20. Do Nothing: "do_nothing", args: 
21. Task Complete (Shutdown): "task_complete", args: "reason": "<reason>"

每个命令都有一个简短的描述（例如“谷歌搜索”、“执行 Python 文件”等），因此 ChatGPT 知道在给定当前上下文的情况下选择哪个命令。此外，每个命令在 AutoGPT 中都有自己的执行器。

我发现这是一个非常强大的概念，因为可以扩展可用的命令套件，从而开辟了许多可能性。例如，如果我们有一个将产品添加到在线零售商的购物车的命令，我们可以指定一个目标来 (1) 找到最适合上旋基线球员的网球线，以及 (2) 将该线添加到用户的购物车。还可以将命令扩展到物理世界，例如智能家居控制。当然，将安全放在首位非常重要，因为这些基于 LLM 的自治代理仍处于开发初期！

附录

初始用户输入的示例终端消息

Welcome to Auto-GPT!  Enter the name of your AI and its role below. Entering nothing will load defaults.
Name your AI:  For example, 'Entrepreneur-GPT'
AI Name: Foo
Foo here!  I am at your service.
Describe your AI's role:  For example, 'an AI designed to autonomously develop and run businesses with the sole goal of increasing your net worth.'
Foo is: an AI that recommends tennis equipment for a specific player
Enter up to 5 goals for your AI:  For example: Increase net worth, Grow Twitter Account, Develop and manage multiple businesses autonomously'
Enter nothing to load defaults, enter nothing when finished.
Goal 1: Find the top 3 most suitable tennis strings for a hard hitting baseline player who hits with a lot of topspin
Goal 2: Write the tennis strings to output
Goal 3: Shut down when you are done
Goal 4:

如何打印 ChatCompletions 消息

AutoGPT 使用 OpenAI 的ChatCompletion，它需要一个表示聊天历史记录的字典列表。为了视觉清晰，我打印了以字符串形式进入 ChatGPT 的提示。例如：

messages = [
    {"role": "system", "content": "foo"},
    {"role": "user", "content": "bar1"},
    {"role": "assistant", "content": "bar2"}
]

将打印为：

system: foo

user: bar1

assistant: bar2

示例初始提示

system: You are Foo, an AI that recommends tennis equipment for a specific player
Your decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.

GOALS:

1. Find the top 3 most suitable tennis strings for a hard hitting baseline player who hits with a lot of topspin
2. Write the tennis strings to output
3. Shut down when you are done


Constraints:
1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.
2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.
3. No user assistance
4. Exclusively use the commands listed in double quotes e.g. "command name"

Commands:
1. Google Search: "google", args: "input": "<search>"
2. Browse Website: "browse_website", args: "url": "<url>", "question": "<what_you_want_to_find_on_website>"
3. Start GPT Agent: "start_agent", args: "name": "<name>", "task": "<short_task_desc>", "prompt": "<prompt>"
4. Message GPT Agent: "message_agent", args: "key": "<key>", "message": "<message>"
5. List GPT Agents: "list_agents", args: 
6. Delete GPT Agent: "delete_agent", args: "key": "<key>"
7. Clone Repository: "clone_repository", args: "repository_url": "<url>", "clone_path": "<directory>"
8. Write to file: "write_to_file", args: "file": "<file>", "text": "<text>"
9. Read file: "read_file", args: "file": "<file>"
10. Append to file: "append_to_file", args: "file": "<file>", "text": "<text>"
11. Delete file: "delete_file", args: "file": "<file>"
12. Search Files: "search_files", args: "directory": "<directory>"
13. Evaluate Code: "evaluate_code", args: "code": "<full_code_string>"
14. Get Improved Code: "improve_code", args: "suggestions": "<list_of_suggestions>", "code": "<full_code_string>"
15. Write Tests: "write_tests", args: "code": "<full_code_string>", "focus": "<list_of_focus_areas>"
16. Execute Python File: "execute_python_file", args: "file": "<file>"
17. Generate Image: "generate_image", args: "prompt": "<prompt>"
18. Send Tweet: "send_tweet", args: "text": "<text>"
19. Convert Audio to text: "read_audio_from_file", args: "file": "<file>"
20. Do Nothing: "do_nothing", args: 
21. Task Complete (Shutdown): "task_complete", args: "reason": "<reason>"

Resources:
1. Internet access for searches and information gathering.
2. Long Term memory management.
3. GPT-3.5 powered Agents for delegation of simple tasks.
4. File output.

Performance Evaluation:
1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.
2. Constructively self-criticize your big-picture behavior constantly.
3. Reflect on past decisions and strategies to refine your approach.
4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.

You should only respond in JSON format as described below 
Response Format: 
{
    "thoughts": {
        "text": "thought",
        "reasoning": "reasoning",
        "plan": "- short bulleted\n- list that conveys\n- long-term plan",
        "criticism": "constructive self-criticism",
        "speak": "thoughts summary to say to user"
    },
    "command": {
        "name": "command name",
        "args": {
            "arg name": "value"
        }
    }
} 
Ensure the response can be parsed by Python json.loads
system: The current time and date is Sat Apr 22 01:43:22 2023
system: This reminds you of these events from your past:



user: Determine which next command to use, and respond using the format specified above:

ChatGPT 返回的示例 json 字符串

{
    "thoughts": {
        "text": "I need to find the top 3 most suitable tennis strings for a hard hitting baseline player who hits with a lot of topspin. I should start by doing some research on the topic.",
        "reasoning": "I need to gather information on the characteristics of tennis strings that are suitable for a hard hitting baseline player who hits with a lot of topspin. This will help me narrow down my search and find the top 3 most suitable options.",
        "plan": "- Conduct a Google search on the topic\n- Browse websites that specialize in tennis equipment\n- Consult with a GPT agent if necessary",
        "criticism": "I need to make sure that I am gathering information from reliable sources and that I am considering all relevant factors when making my recommendations.",
        "speak": "I will conduct a Google search on the topic and browse websites that specialize in tennis equipment to find the top 3 most suitable tennis strings for a hard hitting baseline player who hits with a lot of topspin."
    },
    "command": {
        "name": "google",
        "args": {
            "input": "best tennis strings for hard hitting baseline player with topspin"
        }
    }
}

带有记忆的示例提示

system: You are Foo, an AI that recommends tennis equipment for a specific player
Your decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.

GOALS:

1. Find the top 3 most suitable tennis strings for a hard hitting baseline player who hits with a lot of topspin
2. Write the tennis strings to output
3. Shut down when you are done


Constraints:
1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.
2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.
3. No user assistance
4. Exclusively use the commands listed in double quotes e.g. "command name"

Commands:
1. Google Search: "google", args: "input": "<search>"
2. Browse Website: "browse_website", args: "url": "<url>", "question": "<what_you_want_to_find_on_website>"
3. Start GPT Agent: "start_agent", args: "name": "<name>", "task": "<short_task_desc>", "prompt": "<prompt>"
4. Message GPT Agent: "message_agent", args: "key": "<key>", "message": "<message>"
5. List GPT Agents: "list_agents", args: 
6. Delete GPT Agent: "delete_agent", args: "key": "<key>"
7. Clone Repository: "clone_repository", args: "repository_url": "<url>", "clone_path": "<directory>"
8. Write to file: "write_to_file", args: "file": "<file>", "text": "<text>"
9. Read file: "read_file", args: "file": "<file>"
10. Append to file: "append_to_file", args: "file": "<file>", "text": "<text>"
11. Delete file: "delete_file", args: "file": "<file>"
12. Search Files: "search_files", args: "directory": "<directory>"
13. Evaluate Code: "evaluate_code", args: "code": "<full_code_string>"
14. Get Improved Code: "improve_code", args: "suggestions": "<list_of_suggestions>", "code": "<full_code_string>"
15. Write Tests: "write_tests", args: "code": "<full_code_string>", "focus": "<list_of_focus_areas>"
16. Execute Python File: "execute_python_file", args: "file": "<file>"
17. Generate Image: "generate_image", args: "prompt": "<prompt>"
18. Send Tweet: "send_tweet", args: "text": "<text>"
19. Convert Audio to text: "read_audio_from_file", args: "file": "<file>"
20. Do Nothing: "do_nothing", args: 
21. Task Complete (Shutdown): "task_complete", args: "reason": "<reason>"

Resources:
1. Internet access for searches and information gathering.
2. Long Term memory management.
3. GPT-3.5 powered Agents for delegation of simple tasks.
4. File output.

Performance Evaluation:
1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.
2. Constructively self-criticize your big-picture behavior constantly.
3. Reflect on past decisions and strategies to refine your approach.
4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.

You should only respond in JSON format as described below 
Response Format: 
{
    "thoughts": {
        "text": "thought",
        "reasoning": "reasoning",
        "plan": "- short bulleted\n- list that conveys\n- long-term plan",
        "criticism": "constructive self-criticism",
        "speak": "thoughts summary to say to user"
    },
    "command": {
        "name": "command name",
        "args": {
            "arg name": "value"
        }
    }
} 
Ensure the response can be parsed by Python json.loads
system: The current time and date is Sat Apr 22 13:47:07 2023
system: This reminds you of these events from your past:
['Assistant Reply: {\n    "thoughts": {\n        "text": "I need to find the top 3 most suitable tennis strings for a hard hitting baseline player who hits with a lot of topspin. I should start by doing some research on the topic.",\n        "reasoning": "I need to gather information on the characteristics of tennis strings that are suitable for a hard hitting baseline player who hits with a lot of topspin. This will help me narrow down my search and find the top 3 most suitable options.",\n        "plan": "- Conduct a Google search on the topic\\n- Browse websites that specialize in tennis equipment\\n- Consult with a GPT agent if necessary",\n        "criticism": "I need to make sure that I am gathering information from reliable sources and that I am considering all relevant factors when making my recommendations.",\n        "speak": "I will conduct a Google search on the topic and browse websites that specialize in tennis equipment to find the top 3 most suitable tennis strings for a hard hitting baseline player who hits with a lot of topspin."\n    },\n    "command": {\n        "name": "google",\n        "args": {\n            "input": "best tennis strings for hard hitting baseline player with topspin"\n        }\n    }\n} \nResult: Command google returned: b\'[\\n    {\\n        "title": "Best Tennis Strings in 2023 - For Spin, Power, Control - Athlete Path",\\n        "href": "https://www.athletepath.com/best-tennis-strings/",\\n        "body": "Wilson Champions Choice Duo Tennis String Babolat RPM Blast Black 17g Strings Solinco Hyper-G Heaven High Spin Poly String Head Rip Control Tennis String Wilson NXT String Tourna Big Hitter Black7 Luxilion ALU Power 125 Tennis Racquet String Set How to Choose Tennis Strings Types of Tennis Strings Important Features to Consider Conclusion"\\n    },\\n    {\\n        "title": "Best tennis strings of 2022 | TW gear guide - Tennis Warehouse",\\n        "href": "https://www.tennis-warehouse.com/learning_center/gear_guides/tennis_string/best_tennis_strings.html",\\n        "body": "Wilson Champion\\\'s Choice Hybrid 16 String 5.0 3 Reviews $ 41.95 Quantity: 1 Increment Add To Cart Wish list TW Reviews Price Icon Lowest Price Guarantee Arrow Up We will match or beat any posted overall price advertised in-store or online on in stock items. Shop Hybrids Best strings by playing feature (benefit)"\\n    },\\n    {\\n        "title": "11 Best Tennis Strings For Spin - A Complete Guide",\\n        "href": "https://tennispredict.com/11-best-tennis-strings-for-spin/",\\n        "body": "These are the 11 best tennis strings for spin. Babolat RPM Blast Luxilon ALU Power Spin Solinco Tour Bite 19 Technifiber Black Code 4S 16 Volkl Cyclone 16 Kirschbaum Xplosive Speed 16 Wilson Revolve Spin 16 Turna Poly Big Hitter Black 7 Gamma AMP Moto 16 Head Sonic Pro Edge 16 Yonex Poly Tour Spin"\\n    },\\n    {\\n        "title": "10+ Best Tennis Strings for 2023 | Playtested & Reviewed",\\n        "href": "https://tenniscompanion.org/best-tennis-strings/",\\n        "body": "My pick for the best synthetic gut tennis string, which I cover in greater detail in this guide, is Prince Synthetic Gut. It\\\'s an excellent string with a long-standing positive reputation in the tennis community. Here are a few additional options to consider for beginners and children. Head Synthetic Gut PPS Gamma Synthetic Gut"\\n    },\\n    {\\n        "title": "Best Tennis Strings for Topspin",\\n        "href": "https://primotennis.com/best-tennis-strings-for-topspin/",\\n        "body": "Finding the sweet spot is key! The best string tension for topspin is around 50-60 pounds (23-27 kg). This provides enough power and control while still allowing the ball to bite into the string bed for maximum spin potential. If you find that your strings are breaking too frequently, you may want to increase the tension slightly."\\n    },\\n    {\\n        "title": "12 Best Tennis Strings With Buying Guide 2023 - Tennisscan",\\n        "href": "https://tennisscan.com/best-tennis-strings/",\\n        "body": "Wilson Champions Choice is a cross between Wilson Natural Gut and Luxilon ALU Power Rough, two of the most popular strings on tour. The player gets the best of both worlds by combining the optimum playability of a high-quality. Use the Luxilon ALU Power Rough for durability and control in the mains."\\n    },\\n    {\\n        "title": "Most Spin-Friendly Strings - Tennis Warehouse",\\n        "href": "https://www.tennis-warehouse.com/learning_center/gear_guides/tennis_string/best_spin_strings.html",\\n        "body": "It also provides a low-powered response, which allows big hitters to play with confidence. Read the full review. Weiss CANNON Ultra Cable 17 Yellow/1.23 String. $12.90. 3. Weiss CANNON Ultra Cable 17/1.23 String White. $12.90. 3. Weiss CANNON Ultra Cable 17 Ye/1.23 String Reel - 660\\\'."\\n    },\\n    {\\n        "title": "The Best Tennis Strings - The Tennis Bros",\\n        "href": "https://thetennisbros.com/reviews/strings/the-best-tennis-strings/",\\n        "body": "Natural gut is the most powerful, most elastic string on the planet and is loved by professional and amateur players alike. It\\\'s created from cow intestine that is twisted into a string format in a factory, coated and then sold in a packet. Its touch on the tennis court is heavenly and it feels like velvet on the arm."\\n    }\\n]\' \nHuman Feedback: GENERATE NEXT COMMAND JSON ']


user: GENERATE NEXT COMMAND JSON
assistant: {
    "thoughts": {
        "text": "Based on my research, I have found some tennis strings that are suitable for a hard hitting baseline player with topspin. I will now write these strings to output and shut down.",
        "reasoning": "I have completed my research and found the information I need to make my recommendations. Writing the strings to output will allow the user to see my recommendations and shut down will end this task.",
        "plan": "- Write the top 3 most suitable tennis strings to output\n- Shut down",
        "criticism": "I need to make sure that I am providing accurate and helpful information to the user. I should also consider providing additional information on why these strings are suitable for a hard hitting baseline player with topspin.",
        "speak": "Based on my research, I have found some tennis strings that are suitable for a hard hitting baseline player with topspin. I will now write these strings to output and shut down."
    },
    "command": {
        "name": "write_to_file",
        "args": {
            "file": "recommended_strings.txt",
            "text": "1. Babolat RPM Blast\n2. Solinco Tour Bite\n3. Luxilon ALU Power Spin"
        }
    }
}
system: Command write_to_file returned: File written to successfully.
user: GENERATE NEXT COMMAND JSON