The Yu Tao group of the University of Hong Kong launched the open source XLANG Agent! Support three Agent modes

Author|Xiaoxi, ZenMoore

A new future has gradually begun to move from theory to reality and come to us .

The meaning of language lies in its use, and the meaning of these large-scale language models since ChatGPT must not be limited to Chat. Four months ago, we introduced Tsinghua University’s overview of tool learning "Tsinghua Publishes Tool Learning Framework, Let ChatGPT Manipulating maps, stock inquiries, Jarvis has come? 》, explore how to better combine the large model represented by GPT-4 with existing professional tools (such as professional equipment, program interface, commercial software, etc.), and turn the large model into a Jarvis-style A personal butler rather than a mere chat machine.

After the Demo-style imagination four months ago, the XLANG Lab, a group of teachers from the University of Hong Kong (XLANG Lab), launched an open source large-scale model Agent——XLANG Agent after five months of full-time development by 15 researchers!

Large model research test portal

GPT-4 Portal (free of wall, can be tested directly, if you encounter browser warning point advanced/continue to visit):
Hello, GPT4!

In the introduced blog, the author of XLANG Agent understands the work that the large model Agent can accomplish as such a process " Imagine this process of converting human instructions or questions in everyday language into actions and codes that machines can understand , the machine then performs these actions in a given environment, changing the state of that environment. These changes are observed, analyzed, and in turn initiate the next cycle of interaction with humans

In fact, the concept of this large-scale agent is just the rudimentary version of the intelligent agents in science fiction works that follow human instructions to perform specific tasks. XLANG acts as a bridge between natural language and specific instructions (such as executable code or specific action sequences), and the environment it interacts with includes but is not limited to databases, web applications, and even the real physical world . In the continuous rounds of interaction with the environment and human beings, the large model Agent can continuously integrate people's feedback into its context to cooperate with Agnet to accurately and effectively complete tasks and extend and expand users' true intentions.

Specifically, the author team summarizes the large model Agent as:

  • The goal of the large model Agent is to solve the problems faced by humans in specific environments, such as data analysis, real estate services, etc., rather than general chat robots;

  • The large model agent allows users to provide feedback in natural language to guide the agent to better explore and complete tasks. In other words, the large model agent can handle multiple rounds of tasks rather than simple input and output of a single round;

  • Large model Agent is equipped with tools such as codes, plug-ins, and browsers to enhance its capabilities, not just limited to the large model itself.

The construction of the entire XLANG Agent is based on LangChain, which is a framework for building applications driven by large models. Based on ReAct in LangChain, the author team built XLANG Agent to complete the task through three stages:

  • Thinking stage: Generate reasoning trajectories to provide support for the next action;

  • Action phase: participate in the interaction with the environment;

  • Observation phase: observe the state of the environment and prepare for the next decision.

Based on LangChain, XLANG Agent is equipped with a comprehensive set of tools, built a complete user interface, reconstructed information presentation and prompting. Different from the code interpreter and plug-ins released by OpenAI, the goal of XLANG Agent is to build an open source and general-purpose large-scale agent system and framework, so that people can iteratively add and improve the design and working logic of the Agent, integrate more tools, and promote large-scale The development of the model Agent and even the more macroscopic Executable Language Grounding .

Currently, XLANG Agent supports three different Agent scenarios, which are data processing, plug-in use, and Web Agent (Robot Agent will be launched soon). Among them, Data Agent allows users to take active actions to meet user needs after selecting a specific tool, such as Let the agent first find a dataset of stocks for itself:

Through one-click operation, the data set found by the Agent can be loaded into the Files of the interface:

And through simple instructions, the Agent can draw interactive charts for itself

And you can let the Agent use a model like ARIMA to fit the data, you can see that the Agent may fail to fit:

But "Try it again", you can let the Agent try again and successfully build a good model

Similarly, the plug-in agent can use the provided hundreds of APIs to intelligently determine the plug-in that should be used in the current environment. For example, when I go to Toronto, the plug-in agent will intelligently recommend scenic spots, handle currency conversion, provide weather updates, and clothing suggestions wait :

Web Agent will use Chrome extensions to automate website navigation, simplify browsing and enhance information retrieval capabilities, such as extracting movie reviews from IMDb, etc.

At present, these three agents are already online. As the beginning of the XLANG open source journey, the author team said that in the next few months and longer, all frameworks, models, demos, codes, and benchmarks will be released. XLANG Agent's homepage, code and documentation are as follows:

博客题目:
Introducing XLang: An Open-Source Framework for Building Language Model Agents via Executable Language Grounding

Guess you like

Origin blog.csdn.net/xixiaoyaoww/article/details/132277921