Enter the NLP group —> join the NLP exchange group
To facilitate the tool usage capabilities of open source LLMs, the authors introduce ToolLLM , a general tool usage framework for data construction, model training, and evaluation.
Paper: ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Address: https://arxiv.org/abs/2307.16789
Project: https://github.com/OpenBMB/ToolBench
Units: Tsinghua University, Renmin University of China, Yale, WeChat, Tencent, Zhihu
Despite the progress of open-source Large Language Models (LLMs) and their variants such as LLaMA and Vicuna, they are still very limited in performing higher-level tasks such as following human instructions using external tools (APIs).
This is because current instruction tuning mainly focuses on basic language tasks rather than the domain of tool usage.
This is in stark contrast to state-of-the-art (SOTA) LLMs such as ChatGPT, which demonstrate excellent tool usage, but are unfortunately closed-source.
To facilitate the tool usage capabilities of open source LLMs, we introduce ToolLLM , a general tool usage framework for data construction, model training, and evaluation.
We first introduce ToolBench, an instruction-tuned dataset for use by the tool, which is automatically created using ChatGPT.
Specifically, we collect 16,464 real-world RESTful APIs from RapidAPI Hub, covering 49 categories, and then prompt ChatGPT to generate various human instructions involving these APIs, covering both single-tool and multi-tool scenarios.
Finally, we search for valid solution paths (API call chains) for each instruction using ChatGPT.
To make the search process more efficient, we develop a novel depth-first search-based decision tree (DFSDT), which enables LLMs to evaluate multiple inference trajectories and expand the search space. We demonstrate that DFSDT significantly enhances the planning and reasoning capabilities of LLMs.
To efficiently evaluate tool usage, we developed an automatic evaluator: ToolEval .
We fine-tune LLaMA on ToolBench and obtain ToolLLaMA .
Our ToolEval shows that ToolLLaMA exhibits a remarkable ability to execute complex instructions and generalize to unseen APIs, and exhibits comparable performance to ChatGPT.
To make the pipeline more practical, we designed a neural API finder to recommend the appropriate API for each instruction, eliminating the need to manually select the API.
Enter the NLP group —> join the NLP exchange group