Talk about open source ChatGPT work - MOSS

Since the release of ChatGPT, its "three-step plan" is like "Nine Yin Scriptures" drifting into the AI ​​arena. The practice methods of various sects are different. Some people are practicing like Guo Jing, and they are practicing step by step; This also shows that the "three-step plan" is not the end. On the basis of recurrence, do more thinking, like a wine monk, recognize the shortcomings of the "Nine Yin Manual" and repair them, so as to obtain the "Nine Yang Manual" that harmonizes yin and yang, and helps each other.

This series hopes to provide a bird's-eye view by introducing the existing ChatGPT-like work, so that readers can understand their similarities and differences. It would be best if they can be inspired. Limited by personal ability, there may be omissions and mistakes, welcome to discuss together.

Part1 three-step plan

Holding the principle of self-contained, I briefly introduce ChatGPT's three-step technical solution [1]. A more detailed introduction can be found on the Internet:

  1. SFT Model: Supervised fine-tuning is performed on the powerful GPT pre-training model through manually labeled question-and-answer data to obtain the SFT model. This process can also be called Instruction tuning.

  1. Reward Model: The SFT model will get multiple answers to the same question, mark them as good or bad, and then train a GPT model with the labeled data, so that the model can judge whether the answer is good or bad, and then get the Reward Model.

  1. PPO Model: Using the Proximal Policy Optimization algorithm, let the Reward Model guide the SFT model to answer in line with human intentions, and then get the PPO Model, which is the ChatGPT model.

Part2Moss

Let's start with the work in Chinese. Moss from Fudan University fired the first shot of the Chinese version of ChatGPT on the evening of February 20, 23, but I couldn't open the official homepage that night. Recently, the MOSS team fulfilled their promise and released the pre-training model, SFT model (including plug-in version) and training data together. This should be the most thorough open-source Chinese ChatGPT work. Next, let’s take a look at it from several aspects:

overall effect

MOSS currently open-sources moss-moon-003-sft. This is not the final version, but the main capabilities are basically available. Netizens on Zhihu have evaluated it . The conclusion is that the performance of MOSS has basically reached 78% of that of ChatGLM-6b. The official said that their evaluation results will be announced later, but from my personal subjective experience of using moss-moon-003-sft, ChatGLM-6b is indeed slightly better.

pre-trained model

Although the relevant papers have not yet been released, according to the introduction of the participants [2]:

The base is codegen initialization, its training corpus includes pile bigquery bigpython, and we continue to pre-train on 100B Chinese + 20B English and code

As a code generation model, codegen is used to initialize the base model, which is unavoidably counterintuitive. But think about it:

  1. Looking at the country, the wind of large models started from the release of Bert at the end of 2018. The structure of decoder-only is not as popular as encoder-only and encoder-decoder. As a result, there are not many decoder-only models developed by schools and enterprises, and there are even fewer large-scale ones.

  1. At that time, the large version of Bert had only 300 million parameters. If you want to make a 10 billion model, you need not only courage, but also resources. There are very few players who can enter the game, so the 10 billion model trained by others will not be easily open source.

  1. Writing code for large models is now standard, but there are not many open source Chinese large models that use code as pre-training data.

Therefore, an open source available Chinese decoder-only large model almost does not exist. In this way, if there is no self-developed large model, then continuing pre-training on a code generation model is also a curve-saving solution. However, it is estimated that due to limited computing resources, the MOSS team continued to pre-train 120B, which is still a gap compared with the now popular LLaMA (1T+ tokens).

SFT

SFT部分的核心在于标注数据的质量,MOSS所用的SFT数据一大亮点是来自于真实用户的提问,同时他们也引入了3H(helpfulness, honesty, harmlessness)数据,这样即使是SFT模型,也能够具备初步的有助、无害、诚实的能力。目前开源了moss-002的标注数据,这部分的数据回复偏短,可能是由于使用Self-Instruct生成的缘故。SFT对数据质量的要求非常高,所谓“garbage in, garbage out”,Alpaca使用了self-instruct生成的数据,而Vicuna使用了shareGPT上用户真实的数据,效果上Vicuna明显胜出,中文方面目前很缺少像shareGPT这样的高质量数据。

SFT-plugin

MOSS团队同时还放出了会使用工具(即plugin)的版本——moss-moon-003-sft-plugin。大模型让人惊艳的能力之一便是它的推理能力,所谓推理,是指将复杂问题拆分为多个简单子问题的能力,通过逐个解决这些简单子问题,便能得到最终的正确答案。但在实际中,即使强如ChatGPT,在解决简单子问题的时候仍然容易出错,比如4位数的乘除法,一旦其中一个子问题出错,便会导致最终结果的错误。自然而然地,我们可以想到:让模型在解决简单子问题时通过使用外部工具以保证答案的正确性。

换一个角度,将模型比作大脑,而一个个工具则相当于是模型能与外部世界交互的手脚,这是一个令人激动的方向。近期这方面有不少亮眼的工作,如:ToolFormer[3]、ReAct[4]、HuggingGPT[5]、TaskMatrix.AI[6]、AutoGPT[7]……open AI自己也相应推出了ChatGPT plugins

让ChatGPT使用工具,仅仅需要在Prompt里说明每个工具即可,借助于ChatGPT强大的zero-shot推理能力,即使ChatGPT从未在训练过程中用过这些工具,它也能使用这个工具较好地完成任务。但对于像MOSS一样的百亿模型也可以吗?MOSS选择的方案是为每个工具构造对应的训练数据,然后用于训练模型,通过这种方式教会模型使用特定的工具。

“使用搜索工具”的训练数据

“使用搜索工具”的训练数据

这种方案优点是能够确保模型能够深刻理解每个工具的用处,但缺点也很明显:

  1. 每次添加新工具,需要对应构造训练数据,重新训练模型

  1. 从给出的样例来看,用户的问题仅需一个工具就被解决了,但实际中,用户的要求可能需要调用多个工具才能达成,比如:姚明比奥尼尔高多少?

总之,让模型自己分解任务,组合使用工具完成任务,被认为是“自主智能体”,这个方向这近期将会有大量的工作涌现。

Reward Model+PPO

MOSS repo的README中提到其最终版是经过偏好模型训练得来的,这里的偏好模型即Reward Model。目前能在PPO上取得收益的开源工作好像不多,所以很多类Chatgpt的工作也只是做了第一步SFT。目前MOSS还未开源这部分的工作,很期待看他们的实现细节。

——2023.04.24

Reference

[1] Introducing ChatGPT

[2] 复旦团队大模型 MOSS 开源了,有哪些技术亮点值得关注? - 孙天祥的回答 - 知乎

[3] Language Models Can Teach Themselves to Use Tools

[4] REACT : SYNERGIZING REASONING AND ACTING INLANGUAGE MODELS

[5] HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

[6] TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

[7] Auto-GPT: An Autonomous GPT-4 Experiment https://github.com/Significant-Gravitas/Auto-GPT

Guess you like

Origin blog.csdn.net/CompHub/article/details/130354853