Editor's note: Since this year, large language models (LLMs) have been widely used in various natural language processing tasks, and are increasingly used to build complex language applications. However, the construction of multi-task LLM applications still faces certain challenges, and problems such as task combination and regulation need to be solved.

This article introduces all aspects that may be involved in building a multi-task LLM application, including how to design and use the control flow, how to test agents, etc. This article will be very useful for readers who want to design a useful and powerful LLM application Valuable learning materials, readers can have a preliminary understanding of the field through this article.

The following is the translation, Enjoy!

Author | Chip Huyen

Compile | Yue Yang

This article focuses on how to use control flow (e.g. if statement, for loop) to combine multiple tasks and combine tools (e.g. SQL executor, bash, web browser, third-party API) to create more complex and powerful LLM applications .

01 Application composed of multitasking

Most LLM applications are more complex. For example, in the application scenario of "dialogue with data", we will need to connect and query the database through natural language. If you need to manipulate a table of credit card transactions, you can ask: "How many different merchants are there in Phoenix, and what are their names?" The database will return: "There are 9 different merchants in Phoenix. Merchants, they are...".

Applications that implement this "talk to data" scenario typically proceed by executing the following sequence of tasks:

Task 1: Convert natural language from user input into SQL queries [LLM]

Task 2: Execute SQL query statements in the database [SQL Executor]

Task 3: Convert the query results of SQL statements into natural language and display them to users [LLM]

02 Agents, tools, and control flows Agents, tools, and control flows

I did a little research online, and there seems to be no consensus on the definitions of these few technical terms.

The term Agent has been used extensively to refer to an application that can perform multiple tasks according to a given control flow (see section 2.2 Control Flow). Each task can utilize one or more tools. In the above example, the SQL Executor is a tool.

Note: There has been some resistance to using the term agent in the context of this paper because it has been overused in other domains (e.g. using agent to refer to policy in reinforcement learning [1] )).

2.1 Tools and Plugins

In addition to the SQL executor, there are many tools for accomplishing tasks, such as:

Search tools (e.g. search using Google Search API or Bing API)
Internet browser (e.g., given a URL, the ability to fetch its content)
bash executor
calculator

Tools and plugins are basically two ways of saying the same thing. We can think of plugins as tools contributed to the OpenAI plugin store. As of now (Translator's Note: The article is published on April 11, 2023), the OpenAI plugin is not open to the public, but anyone can create and use tools.

2.2 Control flow: sequential, parallel, conditional statement, loop (sequential, parallel, if, for loop)

In the above example, the control flow used is sequential, in which one task executes after another. There are other types of control flow, such as parallelism, conditional statements, loops.

Sequential : Execute task B after task A completes, probably because task B depends on task A. For example, an SQL query statement can only be executed after being translated from user input into an SQL query statement.
Parallel : Execute task A and task B at the same time.
Conditional statement : Select to execute task A or task B according to the user's input.
Loop : Task A is executed repeatedly until a certain condition is met. For example, use the browser to get the content of a certain web page, and continue to use the browser to get the content of other links in the web page until the agent believes that it has obtained enough information to answer the user's question.

Note: While parallelism is certainly useful, I haven't seen many applications that use this kind of control flow.

2.3 Control flow using LLM agents

In traditional software engineering, the conditions of control flows are precise. And in LLM applications (also known as agents), conditions may also be determined through prompts. (Translator's Note: The original text of "conditions" in this sentence is conditions, which here refers to specific situations or rules that need to be satisfied when determining which branch or operation in the control flow is executed.)

For example, if you want LLM agents to choose between the three operations of search tool (search), SQL executor (SQL executor) and chat tool (Chat) , you can explain it like the following input and tell it how to choose One of these operations, is shown below (very roughly). In other words, you can use the LLM to decide the conditions that control the flow!

You have access to three tools: Search, SQL executor, and Chat.

Search is useful when users want information about current events or products.

SQL executor is useful when users want information that can be queried from a database.

Chat is useful when users want general information.

Provide your response in the following format:

Input: { input }

Thought: { thought }

Action: { action }

Action Input: { action_input }

Observation: { action_output }

Thought: { thought }

You can use three tools: search tool (search), SQL executor (SQL executor) and chat tool (Chat).

Search tools are useful in scenarios when users want to get information about current events or products.

The SQL executor should be used when users want to query information from the database.

When users want to get basic information, it is recommended to use chat tools.

Please answer in the following format:

Input: { input }

Thought: { thought }

Action: { action }

Action Input: { action_input }

Observation: { action_output }

Thought: { thought }

2.4 Testing an agent testing agent

For an agent to be reliable, we need to build and test all tasks individually before combining them. There are two main failure modes:

1) One or more task tests failed. Possible causes include:

Control flow error: non-optional action selected
One or more tasks produced incorrect results

2) All tasks produced correct results, but the overall solution was incorrect. Press et al. (2022) refer to this situation as the "composability gap [2]": the proportion of compositional questions answered incorrectly by the model, given that the model answered all subquestions correctly.

As in software engineering, we can and should unit test every component and control flow. Pairs of as evaluation examples can be defined for each component, which can be used to evaluate the application every time a prompt or control flow is updated. It is also possible to perform integration tests on the entire application. Pairwise evaluation examples are formatted as follows: (input, expected output).

END

References

1.https://en.wikipedia.org/wiki/Reinforcement_learning

2.https://ofir.io/self-ask.pdf

This article is authorized by the original author and compiled by Baihai IDP. If you need to reprint the translation, please contact us for authorization.

Original link :

https://huyenchip.com/2023/04/11/llm-engineering.html

Recommended reading:

LLM | From technical principles to practice (IDP large language model related dry goods collection-202308)

What you should know about the large-scale model (3): How to build a multi-task LLM application