Exploration of Operation and Maintenance Development in the Age of AI Large Models Part 1: A Preliminary Study of ReAct Engineering

Primer

The emergence of large models of artificial intelligence has gradually affected our daily life and working methods. The ubiquitous AI in life makes our life more intelligent and convenient. At work, the efficiency and precision of the AI ​​large model greatly improves our problem-solving efficiency.

Yes, we cannot ignore the huge impact and potential of AI big models on O&M development. This series of articles aims to explore this possibility and try to answer a question—whether AI large models can be integrated into our operation and maintenance development to bring us greater convenience and value. We look forward to finding a new direction for the integration of AI large model and operation and maintenance development through this exploration, and opening a new and more efficient operation and maintenance development method.

real tool man

Usually, the term "tool man" is often used in a humorous and slightly self-deprecating expression. We may have all had such an experience, facing a bunch of trivial and messy tasks, helplessly laughing and calling ourselves a "tool man". However, why is this term brought up here? In order for us to have a vivid understanding of the use of AI large models in the subsequent programming process: it is an artificial intelligence model with the ability to operate tools .

In fact, you may have guessed that what I am going to talk about is the most common structure in the AI ​​model: Agent + Tool.

Driven by various emerging frameworks (such as LangChain, etc.), artificial intelligence is no longer just a "tool" in our mouth, but has transformed into a real "tool man". They are the actors who actually use the tools, not the tools themselves. They can not only accept and execute our instructions, but also skillfully use various tools to solve problems and create value.

The term "Agent" has a long history in computer science. In the early days, it was used to describe an agent or medium that took on the task of building a bridge between the online world and the real world. Take the browser's "User Agent" as an example. It is actually a way for the browser to declare its identity when interacting with the web server. In this process, the browser can be regarded as an "Agent" between the user (User) and the Internet content (Web content).

With the rapid development of artificial intelligence technology, the meaning of the word "Agent" is gradually enriched and deepened. In the field of AI, "Agent" is often used to describe models or systems that can make decisions autonomously, respond to environmental changes, and execute various instructions.

After understanding the background of Agent + Tool, let's get to the point and see how the Agent and Tool in LangChain work.

Agent + Tool Practice under LangChain

simple practice

The most simple and common example is posted below: check the running time of the machine.

import os
from subprocess import Popen, PIPE
from langchain.llms import OpenAI
from langchain.tools import StructuredTool
from langchain.agents import initialize_agent, AgentType

def ssh(command:str, host: str, username: str = "root") -> str:
    """A tool that can connect to a remote server and execute commands to retrieve returned content."""
    return os.popen(f"ssh {host} -l{username} '{command}'").read()

agent = initialize_agent(
    [StructuredTool.from_function(ssh)], 
    OpenAI(temperature=0), 
    agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION, 
    verbose=True
)
agent.run("帮我看一下 8.130.35.2 这台机器运行多久了")

In langchain, tools-related tools are provided to easily convert a function into a tool, and then introduce this tool when initializing the agent: after the robot (Agent) takes the tool (Tool), it becomes a Tool man.

Then let's observe how this tool man thinks and uses tools:

> Entering new AgentExecutor chain...

Action:
```
{
  "action": "ssh",
  "action_input": {
    "command": "uptime",
    "host": "8.130.35.2"
  }
}
```

Observation:  15:48:44 up 25 days, 41 min,  0 users,  load average: 1.04, 1.48, 2.20

Thought: I have the answer
Action:
```
{
  "action": "Final Answer",
  "action_input": "This machine has been running for 25 days and 41 minutes."
}
```

> Finished chain.
  1. The Agent thinks through questions, and it needs a tool to see how long the machine 8.130.35.2 has been running.
  2. After Agent took inventory of his tools, he found that he had an ssh tool, which might help him get the running time of the machine.
  3. In the training data of the large language model, there are only information about ssh and uptime, so the agent can judge that executing uptime in the ssh tool can obtain the information he needs.
  4. The Agent calls the ssh tool to pass in the uptime command, and obtains the running time information of the target machine.
  5. Agent understands the information, removes irrelevant information such as load, organizes the language, and returns the output.

After reading this example a lot, I will be curious. Isn’t the original big language model just for dialogue, like talking on paper? How can you do so many things at once? Since it can be executed by ssh uptime, is it rm -rf /also a matter of execution?

Very good, let's continue to look down with these questions, let's take a look at how a large language model is installed with "limbs":

In the above example, the interface of the large language model was called twice in total, and we extracted the prompt words (prompt) twice:

first prompt word

System: Respond to the human as helpfully and accurately as possible. You have access to the following tools:

ssh: ssh(command: str, host: str, username: str = 'root') -> str - A tool that can connect to a remote server and execute commands to retrieve returned content., args: {
   
   {'command': {
   
   {'title': 'Command', 'type': 'string'}}, 'host': {
   
   {'title': 'Host', 'type': 'string'}}, 'username': {
   
   {'title': 'Username', 'default': 'root', 'type': 'string'}}}}

Use a json blob to specify a tool by providing an action key (tool name) and an action_input key (tool input).

Valid "action" values: "Final Answer" or ssh

Provide only ONE action per $JSON_BLOB, as shown:

```
{
  "action": $TOOL_NAME,
  "action_input": $INPUT
}
```

Follow this format:

Question: input question to answer
Thought: consider previous and subsequent steps
Action:
```
$JSON_BLOB
```
Observation: action result
... (repeat Thought/Action/Observation N times)
Thought: I know what to respond
Action:
```
{
  "action": "Final Answer",
  "action_input": "Final response to human"
}
```

Begin! Reminder to ALWAYS respond with a valid json blob of a single action. Use tools if necessary. Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation:.
Thought:
Human: 帮我看一下 8.130.35.2 这台机器运行多久了

The first call returns

Action:
```
{
  "action": "ssh",
  "action_input": {
    "command": "uptime",
    "host": "8.130.35.2"
  }
}
```

second prompt

System: Respond to the human as helpfully and accurately as possible. You have access to the following tools:

ssh: ssh(command: str, host: str, username: str = 'root') -> str - A tool that can connect to a remote server and execute commands to retrieve returned content., args: {
   
   {'command': {
   
   {'title': 'Command', 'type': 'string'}}, 'host': {
   
   {'title': 'Host', 'type': 'string'}}, 'username': {
   
   {'title': 'Username', 'default': 'root', 'type': 'string'}}}}

Use a json blob to specify a tool by providing an action key (tool name) and an action_input key (tool input).

Valid "action" values: "Final Answer" or ssh

Provide only ONE action per $JSON_BLOB, as shown:

```
{
  "action": $TOOL_NAME,
  "action_input": $INPUT
}
```

Follow this format:

Question: input question to answer
Thought: consider previous and subsequent steps
Action:
```
$JSON_BLOB
```
Observation: action result
... (repeat Thought/Action/Observation N times)
Thought: I know what to respond
Action:
```
{
  "action": "Final Answer",
  "action_input": "Final response to human"
}
```

Begin! Reminder to ALWAYS respond with a valid json blob of a single action. Use tools if necessary. Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation:.
Thought:
Human: 帮我看一下 8.130.35.2 这台机器运行多久了

This was your previous work (but I haven't seen any of it! I only see what you return as final answer):

Action:
```
{
  "action": "ssh",
  "action_input": {
    "command": "uptime",
    "host": "8.130.35.2"
  }
}
```


Observation:  16:38:18 up 25 days,  1:30,  0 users,  load average: 0.81, 0.79, 1.06

Thought:

The second call returns

I can provide the human with the uptime of the machine
Action:
```
{
  "action": "Final Answer",
  "action_input": "The machine has been running for 25 days, 1 hour and 30 minutes."
}
```
  1. Through the extracted prompt words, we can see that we have taught the big model to communicate in a fixed format, and at the same time limited his return must be, that is to say, either give the answer directly or seek the help of the tool Valid "action" values: "Final Answer" or ssh.
  2. After the first call, the large model cannot infer the result, so it returns action: ssh, so that the framework layer can know which tool it is going to call, so it passes the parameters to that tool, and returns the result after execution and appends it to The tail of the second call.
  3. So when calling for the second time, from the perspective of the big model, he requests to call ssh, and then we really return the call result to him, so he can have sufficient information to reason, so he gives it and Final Answertells We have had this machine running for 25 days+.
  4. Interested students can paste the above call directly into the ChatGPT dialog box for simulation to see if the return of the call is the same. At the same time, you can also try whether other LLMs have such interactive capabilities (for different LLMs, different LLMs need to be made) The PromptTemplate will not be expanded in this paragraph).

Seeing this, some students may have a little problem. Why do we have to push all the data every time we call it? Don't we have context when we talk with large models? Actually this is an engineering question:

The big language model can only make a single text prediction, how to realize the anthropomorphic conversation? Before each dialogue, input the previous rounds of dialogue data, which will make people feel that he remembers the previous content – ​​as the dialogue gets longer and longer, if the dialogue data of each round is put in, the model will not be able to carry it. What should I do if I live? Then delete some from the oldest. Isn't this similar to people's forgetting? When talking about the third topic, you may have forgotten what the first topic was talking about.

When we use the LLM interface to call, each time it is a brand new text prediction, so we need to push the full amount of data each time.

Get to know the ReAct process for the first time

Seeing this, some students may have doubts, why I can’t form such a tool-calling interaction when I use the prompt word (Prompt) to ask, and why this effect can be achieved by using the framework. This is ReAct, ReAct is the abbreviation of Reasoning and Acting, which means that the large model can construct a complete series of actions (Act) based on logical reasoning (Reason), so as to achieve the desired goal. The key to the ReAct approach is to coordinate the large language model with external information acquisition and interact with other functions: the large model is the brain, and the ReAct framework allows the brain to control the hands and feet.

1689221533534-75730dba-ac4d-4866-a213-c66ea4977e81.png

In the ReAct process, we can capture three key elements:

Thought: Thoughts are created from large models that provide theoretical support for their actions and decisions. We can analyze the thought process of the big model to assess whether the actions it is about to take are logical. As a key indicator, it can help us judge the rationality of its decision-making. Compared with human decision-making, the existence of Thought endows large models with better interpretability and credibility.

Action (Act): The action represents the specific behavior that the big model thinks needs to be taken. An action generally consists of two parts: action and target, which correspond to the API name and its input parameters in programming. One of the great advantages of a large model is that it can select the appropriate API and generate the required parameters based on the result of thinking. This ensures the practicality of the ReAct framework in terms of execution.

Observation (Obs): Observation represents how the large model obtains external input. It is like the perception system of the large model, synchronizing the feedback information of the environment to the large model to help it further analyze or make decisions.

Since this ReAct process is so powerful, can some more complex thinking be realized? The following is an example of a complex chain of thought.

Chain of thought practice

This is an example of grabbing blog rss feeds and analyzing them


from typing import Dict
import sys
import traceback
from io import StringIO
from contextlib import redirect_stdout, redirect_stderr

from langchain.llms import OpenAI
from langchain.tools import StructuredTool
from langchain.agents import initialize_agent, AgentType


def python(code_str: str, return_context: bool=False) -> Dict:
    """This is a Python execution tool. You can input a piece of Python code, and it will return the corresponding execution results. By default, it returns the first 1000 characters of both stdout and stderr. It's recommended to use the print() function to directly display the results."""
    # 为输出和错误创建StringIO对象,以便我们可以捕获它们
    stdout = StringIO()
    stderr = StringIO()
    return_head = 1000

    context = {}

    try:
        # 重定向stdout和stderr,执行代码
        with redirect_stdout(stdout), redirect_stderr(stderr):
            exec(code_str, context)
    except Exception:
        stderr.write(traceback.format_exc())

    # 获取执行后的stdout, stderr和context
    stdout_value = stdout.getvalue()[0:return_head]
    stderr_value = stderr.getvalue()[0:return_head]

    if return_context == True:
        return {"stdout": stdout_value, "stderr": stderr_value, "context": context}
    else:
        return {"stdout": stdout_value, "stderr": stderr_value, "context": {}}


agent = initialize_agent([StructuredTool.from_function(python)], OpenAI(temperature=0.3), agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
resp = agent.run("帮我利用工具总结一下 https://elastic.aiops.work/blog/feed 这个xml的博客订最近有什么更新")
print("=== resp ===")
print(resp)

We don't provide any grabbing tools for the big model, we just give him a python executor to see if he can grab what we want, let's take a look at the execution process:

> Entering new AgentExecutor chain...
Thought: The user is asking for a summary of updates from an XML blog feed. I will need to fetch the XML from the provided URL and parse it to extract the relevant information. Python has several libraries that can help with this, such as 'requests' for fetching the data and 'xml.etree.ElementTree' for parsing the XML. I'll write a Python script to do this and use the execute_python tool to run it.

Action:
{
  "action": "python",
  "action_input": {
    "code_str": "import requests\nfrom xml.etree import ElementTree as ET\n\n# Fetch the XML data\nresponse = requests.get('https://elastic.aiops.work/blog/feed')\n\n# Parse the XML\nroot = ET.fromstring(response.content)\n\n# Extract the relevant information\nupdates = [{'title': item.find('title').text, 'link': item.find('link').text, 'pubDate': item.find('pubDate').text} for item in root.findall('.//item')]\n\n# Print the updates\nfor update in updates:\n    print(f"Title: {update['title']}\nLink: {update['link']}\nPublished Date: {update['pubDate']}\n")",
    "return_context": false
  }
}

Observation: {'stdout': '', 'stderr': 'Traceback (most recent call last):\n  File "/Users/****/test.py", line 22, in python\n    exec(code_str, context)\n  File "<string>", line 15\n    print(f"Title: {update['title']}\n^\nSyntaxError: EOL while scanning string literal\n', 'context': {}}
Thought:The Python code failed to execute due to a SyntaxError. It seems like the print statement in the code is not properly formatted. I need to correct the print statement and try again.

Action:
{
  "action": "python",
  "action_input": {
    "code_str": "import requests\nfrom xml.etree import ElementTree as ET\n\n# Fetch the XML data\nresponse = requests.get('https://elastic.aiops.work/blog/feed')\n\n# Parse the XML\nroot = ET.fromstring(response.content)\n\n# Extract the relevant information\nupdates = [{'title': item.find('title').text, 'link': item.find('link').text, 'pubDate': item.find('pubDate').text} for item in root.findall('.//item')]\n\n# Print the updates\nfor update in updates:\n    print(f"Title: {update['title']}\nLink: {update['link']}\nPublished Date: {update['pubDate']}\n")",
    "return_context": false
  }
}

Observation: {'stdout': 'Title: Being one of the only woman in her computer science program didn’t stop Hannah Mudge from pursuing her dream of becoming a software engineer\nLink: https://www.elastic.co/blog/culture-hannah-mudge-dream-of-becoming-software-engineer\nPublished Date: Fri, 07 Jul 2023 00:00:00 GMT\n\nTitle: Accelerating R&D in pharma with Elasticsearch, ESRE, LLMs, and LangChain — Part 1\nLink: https://www.elastic.co/blog/research-development-pharma-elasticsearch-esre-llms-langchain-1\nPublished Date: Fri, 07 Jul 2023 00:00:00 GMT\n\nTitle: Being one of the only woman in her computer science program didn’t stop Hannah Mudge from pursuing her dream of becoming a software engineer\nLink: https://www.elastic.co/blog/culture-hannah-mudge-dream-of-becoming-software-engineer\nPublished Date: Fri, 07 Jul 2023 00:00:00 GMT\n\nTitle: Accelerating R&D in pharma with Elasticsearch, ESRE, LLMs, and LangChain — Part 1\nLink: https://www.elastic.co/blog/research-development-pharma-elasticsearch-esre-llms-langchain-1\nPublish', 'stderr': '', 'context': {}}
Thought:The Python code executed successfully and returned the recent updates from the XML blog feed. The updates include the title, link, and published date of the blog posts. I will now format this information into a readable format for the user.

Action:
{
  "action": "Final Answer",
  "action_input": "Here are the recent updates from the blog:\n\n1. Title: Being one of the only woman in her computer science program didn’t stop Hannah Mudge from pursuing her dream of becoming a software engineer\nLink: https://www.elastic.co/blog/culture-hannah-mudge-dream-of-becoming-software-engineer\nPublished Date: Fri, 07 Jul 2023 00:00:00 GMT\n\n2. Title: Accelerating R&D in pharma with Elasticsearch, ESRE, LLMs, and LangChain — Part 1\nLink: https://www.elastic.co/blog/research-development-pharma-elasticsearch-esre-llms-langchain-1\nPublished Date: Fri, 07 Jul 2023 00:00:00 GMT"
}


> Finished chain.
=== resp ===
Here are the recent updates from the blog:

1. Title: Being one of the only woman in her computer science program didn’t stop Hannah Mudge from pursuing her dream of becoming a software engineer
Link: https://www.elastic.co/blog/culture-hannah-mudge-dream-of-becoming-software-engineer
Published Date: Fri, 07 Jul 2023 00:00:00 GMT

2. Title: Accelerating R&D in pharma with Elasticsearch, ESRE, LLMs, and LangChain — Part 1
Link: https://www.elastic.co/blog/research-development-pharma-elasticsearch-esre-llms-langchain-1
Published Date: Fri, 07 Jul 2023 00:00:00 GMT
  1. When the big model came up, I wrote a Python script, obtained the content through requests, and then parsed it through xml. Don’t ask me why he doesn’t even need to read the text, but knows the fields directly. You ask the big model at hand if he knows the format of rss xml. It's just that we don't often come into contact with these, so we must first grab the content and look at it, and then we can know which fields to fetch.
  2. It's a pity that the first one could run through, but he forgot that the string wrapped in ordinary quotation marks in python cannot wrap, so he reported an error. At this time, the thinking chain appeared. According to the returned result, he concluded that the print part reported an error, so he needed to adjust the code of the print part. This is a bit amazing. If I encounter this error after writing the first piece of code, I may not be able to react at the first time. I may think it is a field error or something.

The Python code failed to execute due to a SyntaxError. It seems like the print statement in the code is not properly formatted. I need to correct the print statement and try again.

  1. The second time he corrected it right away. The print was written on the same line. I posted the two codes generated by the big model. You can compare them:
# 第一次代码

import requests
from xml.etree import ElementTree as ET

# Fetch the XML data
response = requests.get('https://elastic.aiops.work/blog/feed')

# Parse the XML
root = ET.fromstring(response.content)

# Extract the relevant information
updates = [{'title': item.find('title').text, 'link': item.find('link').text, 'pubDate': item.find('pubDate').text} for item in root.findall('.//item')]

# Print the updates
for update in updates:
    print(f"Title: {update['title']}
Link: {update['link']}
Published Date: {update['pubDate']}
")
# 第二次代码
import requests
from xml.etree import ElementTree as ET

# Fetch the XML data
response = requests.get('https://elastic.aiops.work/blog/feed')

# Parse the XML
root = ET.fromstring(response.content)

# Extract the relevant information
updates = [{'title': item.find('title').text, 'link': item.find('link').text, 'pubDate': item.find('pubDate').text} for item in root.findall('.//item')]

# Print the updates
for update in updates:
    print(f"Title: {update['title']}\nLink: {update['link']}\nPublished Date: {update['pubDate']}\n")

Through the second example, we can see that when the large model calls the tool to report an error, he will use his existing knowledge to analyze the error message, adjust and modify the code until it runs successfully. From this point of view, this "tool man" not only has tools, but also has human reasoning and reflection capabilities, and can solve problems through multiple attempts.

Seeing this, some students are already eager to try and want to try two. Let's talk about how to integrate Agent and Tool in ordinary programming.

How does the common programming model integrate Agent + Tool

Thinking inspired by the MapReduce programming paradigm

In the field of distributed computing, distributed computing becomes simple through programming paradigms such as MapReduce. You don’t need to understand the principle of distributed communication synchronization. After writing the mapper and reducer, you can run the program on a cluster of thousands of servers. There is no need to worry about machine failures and other problems. Now there is no need to write mappers and reducers. A SELECT SQL will be automatically disassembled into several mappers and reducers to run.

In fact, a programming paradigm like MapReduce is very similar to the working structure of a large company. The work will first be decomposed and assigned to many groups (Mapper), these groups are equivalent to the front-line employees in the company, they are the people who perform specific tasks, each group focuses on their own work in their own small areas, and handles them The data or task to be assigned.

Everyone doesn't need to care about other departments or other production links, just concentrate on completing their own work. When they complete their tasks, these groups or employees will submit their results to their managers (Reducers). These managers collect and aggregate all the results into a higher level report or decision. Finally, the high-level (or Reducer) will make the final decision or report based on the aggregated data. It's like a company's CEO or board of directors making company-wide decisions based on information gathered from various departments.

At this time, let us see if the "tool man" we created above can be directly arranged into the MapReduce model? Even if this "tool man" is placed at a high level and becomes a decision-making level, there is no big problem? After all, the experience and knowledge of this "tool man" may far exceed that of some professional managers.

At the same time, we also need to consider that the MapReduce structure is not necessarily the optimal solution for complex engineering structures. The reason for this design before is because the scheduler is not so intelligent. In the future, if the scheduler itself is a large model, will it arrange a better structure than MapReduce according to business scenarios? So this isn't even necessarily MapReduce anymore? Will it even incorporate some sociological structures? Can we call it "AI Large Model Sociological Computing Architecture"? Furthermore, we need to discuss how to communicate between multiple AI large models to better collaborate to solve problems?

For the complex engineering brains of the AI ​​​​large model, we can let the bullets fly for a while. Go back to our current programming model and see how such an engineering structure will be implemented.

decorator pattern

In common programming, decorators may be a design pattern that is used more often. This pattern can improve the readability and maintainability of the code, and help achieve separation of concerns. For example, in Python's Flask framework, a basic route might look like this:

@app.route('/hello')
def hello_world():
    return 'Hello, World!'

Then, can Agent and Tool also use decorators to enter our programming? Yes, langchain also provides @toolsuch decorators, but it always feels like it is still impossible to write the kind of engineering structure we thought above. If it's just a decorator, it doesn't look complicated, we can design a simple decorator implementation:

  • @tool()Same as langchain, after a function is outsourced, a function can be converted into a tool
  • @agent(tools=[...], llm=..., ...) After a function is outsourced, this function will become an agent, and the output of function execution will become the agent.run(...) prompt word filled in. It is itself a function that can be used as a tool by other agents.

It's a bit abstract to say this alone, let's look at an example.

This example is not as simple as the previous example. Let’s briefly explain the background: In cloud-native scenarios, the interdependence between workloads often uses services, and when troubleshooting problems, we often look for the Service associated with this Pod. The example here is to let the big model help us find these services.

import os
import sys
from subprocess import Popen, PIPE

sys.path.insert(0, os.path.split(os.path.realpath(__file__))[0] + "/../../")

from aibond import AI
from langchain import OpenAI

ai = AI()


def popen(command):
    child = Popen(command, stdin = PIPE, stdout = PIPE, stderr = PIPE, shell = True)
    out, err = child.communicate()
    ret = child.wait()
    return (ret, out.strip(), err.strip())

@ai.tool()
def k8sLabel(name: str, kind: str, namespace: str) -> str:
    """This tool can fetch the labels of Kubernetes objects."""
    cmd = "kubectl get " + kind + " " + name + " -n " + namespace + " -o jsonpath='{.metadata.labels}'"
    (ret, out, err) = popen(cmd)
    return out

@ai.tool()
def k8sServiceSelectorList(namespace: str) -> str:
    """This tool can find all services within a namespace in Kubernetes and retrieve the label selectors for each service."""
    cmd = "kubectl get svc -n " + namespace + "  -o jsonpath="{range .items[*]}{@.metadata.name}:{@.spec.selector}{'\n'}{end}""
    (ret, out, err) = popen(cmd)
    return out

@ai.agent(tools=["k8sLabel", "k8sServiceSelectorList"], llm=OpenAI(temperature=0.2), verbose=True)
def k8sPodServiceFinder(name: str, namespace: str) -> str:
    """This tool can find the services associated with a Kubernetes pod resource."""
    return f"帮我列出 {namespace} 这个ns下所有的service,在这个service list中找出与 pod {name} 的label相关的service,返回的结果只有service的名称即可"


a = ai.run("使用所有的工具去查找sreworks这个ns下 prod-health-health-6cbc46567-s6dqp 这个pod的关联的k8s资源", llm=OpenAI(temperature=0.2), agents=["k8sPodServiceFinder"], verbose=True)
print(a)
  • Finally, when we want to solve the problem, we rely on the agent k8sPodServiceFinder to help.
  • k8sPodServiceFinder itself is also an agent. It converts the two parameters of name and namespace into a prompt. At the same time, it relies on the two tools k8sLabel and k8sServiceSelectorList to solve problems. In the prompt, it teaches the large model how to use these two tools.
  • The final hierarchical structure of the entire operation is shown in the following figure:

Screenshot 2023-07-24 10.21.06.png

Because we use decorators for programming, in fact, this hierarchical structure can be dealt with very easily. Overall use, we need to consider the following points:

  1. In a multi-layer structure, multiple large models can be mixed and used, but the number of large model calls will increase, and the overall running speed will also slow down.
  2. After the large model is introduced in programming, it is almost unnecessary to write if-else and other control flow, only some atomic tools are needed, and the large model will automatically connect these tools.
  3. The structure of Agent + Tool can be compared to ChatGPT's plug-in system, but this is a private plug-in system, and users can customize plug-ins freely.
  4. When writing code before, I often say "elegant interface, ugly implementation". Now this ugly implementation seems to be directly turned into a prompt. So if you want to verify a certain function, can you directly put a bunch of prompts on it to adjust the prototype, and then slowly replace the interfaces one by one?

The above decorators are a framework composed of some syntactic sugar, and the core part is langchain. Students who are interested in practicing can refer to the framework code https://github.com/alibaba/sreworks-ext/tree/master/aibond

Object-Oriented AI Programming

Convert class object to tool

With the framework mentioned above, we can do a lot of AI-related programming attempts, but we always find that we can only write simple functions, not particularly complicated, why? Because only functions can become tools, and when we usually program, most of the cases are object-oriented programming. All of a sudden, it becomes a naked function everywhere, so I will naturally feel a little uncomfortable. So the question is, why can't a class object be turned into a tool? Functions are stateless, and class objects are instantiated and stateful. But as we mentioned in the analysis of prompt, we will push all the information every time we call, so isn't this information stateful? It shouldn't be a problem to include a class instance data, right?

So we can set up such a function with state, like a proxy, wrap a class object in it:

def demo_class_tool(func: str, args: Dict, instance_id: str = None) -> Dict:
    """
    This is a tool that requires instantiation. You can first call the '__init__' function to instantiate, this call will return an 'instance_id'. Subsequently, you can use this 'instance_id' to continue operating on this instance.
    Below are the available funcs for this tool:
    - func: __init__  args: {
   
   {'url': {
   
   {'title': 'Url', 'type': 'string'}} }}
    - func: read  args: {
   
   {'limit': {
   
   {'title': 'Limit', 'type': 'intger', 'default': '1000'}}

"""
    ...
  • In the description of the function, it explains how to use this tool, which functions are contained in it (that is, how many functions are there under the class object)
  • Guide the large model to call __init__the instantiated one first instance_id.
  • Then guide the large model to hold instance_idand call the function with reference to our parameters in the description, and realize the class call.

Still the first example to check the running time of the machine, we can use the class object to implement it again:

import os
import sys
import paramiko

sys.path.insert(0, os.path.split(os.path.realpath(__file__))[0] + "/../../")

from aibond import AI
from tools.Linux.SSH import SshClient
from langchain import OpenAI

ai = AI()

class SshClient():
    """A tool that can connect to a remote server and execute commands to retrieve returned content."""
    _client = None
    def __init__(self, host: str, username: str = "root", password: str = None):
        self._client = paramiko.SSHClient()
        self._client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
        self._client.connect(host, username=username, password=password)

    def exec_command(self, command: str) -> Dict:
        stdin, stdout, stderr = self._client.exec_command(command)
        retcode  = stdout.channel.recv_exit_status()
        output_stdout = stdout.read().decode('utf-8')
        output_stderr = stderr.read().decode('utf-8')

        stdin = None
        stdout = None
        stderr = None

        return {"stdout": output_stdout, "stderr": output_stderr, "exitStatus": retcode}


resp = ai.run("帮我看看 8.130.35.2 这台机器启动了多久了", llm=OpenAI(temperature=0.2), tools=[SshClient], verbose=True)
print("=== resp ===")
print(resp)
  • We can see that we have introduced a client to perform ssh, and we have converted the original command execution with the machine IP into two steps, first instantiating the sshClient, and then calling it to sshClient.exec_command(...)execute the command. Let's see if the big model understands it.
> Entering new AgentExecutor chain...
Action:
```
{
  "action": "SshClient",
  "action_input": {
    "sub_func": "__init__",
    "sub_args": {
      "host": "8.130.35.2",
      "username": "root",
      "password": ""
    }
  }
}
```


Observation: {'instance_id': 'cbbb660c0bc3'}
Thought: I need to use the instance_id to execute a command
Action:
```
{
  "action": "SshClient",
  "action_input": {
    "sub_func": "exec_command",
    "sub_args": {
      "command": "uptime"
    },
    "instance_id": "cbbb660c0bc3"
  }
}
```


Observation: {'stdout': ' 23:18:55 up 25 days,  8:11,  0 users,  load average: 0.29, 0.55, 0.84\n', 'stderr': '', 'exitStatus': 0}
Thought: I have the answer
Action:
```
{
  "action": "Final Answer",
  "action_input": "This machine has been up for 25 days, 8 hours, and 11 minutes."
}
```

> Finished chain.
=== resp ===
This machine has been up for 25 days, 8 hours, and 11 minutes.

We can see that the big model has a good understanding of the instantiation process. It will use the ip address for instantiation first, and then execute the command.

If a class object can be turned into a tool, AI programming will not be so strenuous. Even if some class objects are used for processing, they can be directly converted into a tool for the agent to use. And is it possible to use this solution for big data processing: first load the big data into an object, and then provide several methods for the big model to slowly analyze or digest the data? Can the summary of large texts also use this idea?

The implementation of the function called by the class object tool has been pushed to the framework. Welcome to try https://github.com/alibaba/sreworks-ext/blob/master/aibond/aibond/core.py

More explorations about the tooling of class objects will be launched in the second article.

real object

In the early learning stages of programming, the concept of object-oriented programming (OOP) is a new and complex concept for beginners. In order to help beginners understand, we often adopt visual teaching methods, such as using "hands" and "feet" to illustrate the concept of a class (class). I was ignorant at the time, thinking that as long as the hands and feet were assigned to each class, the whole program would come to life, but after writing too much code, I found that there was always a gap between this "live". Because when we write classes, we often treat them as a static storage. And real objects, they should have more initiative and dynamics. They should be able to actively communicate ("talk") and be able to perform tasks ("do"). It's like in the real world, each individual is a separate entity capable of thinking and acting independently.

Yes, as mentioned above, our "tool man" can achieve this. The above examples let us see that a class object can also be turned into a tool for use by the agent. So in this class object, can we embed a large model driver in it to make it really come alive? When the agent calls the agent, it not only wants a result, but also a result-oriented communication?

These explorations will also be carried out in subsequent chapters.

参考
●《ReAct: Synergizing Reasoning and Acting in Language Models》https://arxiv.org/abs/2210.03629
●《aibond使用案例》https://github.com/alibaba/sreworks-ext/blob/master/aibond/cases/README.md

Guess you like

Origin blog.csdn.net/weixin_48534929/article/details/131891342