Liu Zhiyuan and many other institutions proposed ToolLLM: Facilitate large-scale language models to master 16000+ real-world APIs

fdeaaa30ed1ba1b18bb8ea404c3dc9a9.png

Enter the NLP group —> join the NLP exchange group

To facilitate the tool usage capabilities of open source LLMs, the authors introduce ToolLLM , a general tool usage framework for data construction, model training, and evaluation.

b1b7979677a0e628f225259693b67e47.png

Paper: ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Address: https://arxiv.org/abs/2307.16789
Project: https://github.com/OpenBMB/ToolBench
Units: Tsinghua University, Renmin University of China, Yale, WeChat, Tencent, Zhihu

Despite the progress of open-source Large Language Models (LLMs) and their variants such as LLaMA and Vicuna, they are still very limited in performing higher-level tasks such as following human instructions using external tools (APIs).

This is because current instruction tuning mainly focuses on basic language tasks rather than the domain of tool usage.

This is in stark contrast to state-of-the-art (SOTA) LLMs such as ChatGPT, which demonstrate excellent tool usage, but are unfortunately closed-source.

e3e71d5f220156346f25e8c5f27db2c8.png

To facilitate the tool usage capabilities of open source LLMs, we introduce ToolLLM , a general tool usage framework for data construction, model training, and evaluation.

We first introduce ToolBench, an instruction-tuned dataset for use by the tool, which is automatically created using ChatGPT.

Specifically, we collect 16,464 real-world RESTful APIs from RapidAPI Hub, covering 49 categories, and then prompt ChatGPT to generate various human instructions involving these APIs, covering both single-tool and multi-tool scenarios.

27381f5441ee8a6ff3b71b7a125bce8c.png

Finally, we search for valid solution paths (API call chains) for each instruction using ChatGPT.

To make the search process more efficient, we develop a novel depth-first search-based decision tree (DFSDT), which enables LLMs to evaluate multiple inference trajectories and expand the search space. We demonstrate that DFSDT significantly enhances the planning and reasoning capabilities of LLMs.

36515251b5913e179e98005e362bbcba.png

To efficiently evaluate tool usage, we developed an automatic evaluator: ToolEval .

a7992cfb9435e8e9cc6d724ade6fd9c1.png 4f1436310b51cc7b60929bc3c7c95ab3.png

We fine-tune LLaMA on ToolBench and obtain ToolLLaMA .

81fb2923b4b4c9e520fd3d67d32a4210.png

Our ToolEval shows that ToolLLaMA exhibits a remarkable ability to execute complex instructions and generalize to unseen APIs, and exhibits comparable performance to ChatGPT.

3301d83d60a7b1de2bbe09120ea1e80e.png

To make the pipeline more practical, we designed a neural API finder to recommend the appropriate API for each instruction, eliminating the need to manually select the API.

112aaa40a4e51695970289b54bc41307.png

a957bbe2f2db6f954ec12d321aa4f9c7.png


Enter the NLP group —> join the NLP exchange group

Guess you like

Origin blog.csdn.net/qq_27590277/article/details/132061610