Lion闭源大语言模型的对抗蒸馏框架实践

概述

对抗蒸馏框架概述：我们基于高级闭源LLM的基础上提炼一个学生LLM，该LLM具有三个角色：教师、裁判和生成器。有三个迭代阶段：

模仿阶段，对于一组指令，将学生的响应与老师的响应对齐；
区分阶段，识别出难指令；
生成阶段，根据识别出的难指令，产生新的难指令以增加对学生模型的挑战。

权重补偿

我们将Lion权重作为delta权重发布，以符合LLaMA模型许可证。

Lion-7B (delta weights) https://huggingface.co/YuxinJiang/Lion

您可以将我们的delta添加到原始LLaMA权重中，以获得Lion权重。说明：

获取huggingface格式的原始LLaMA权重 https://huggingface.co/docs/transformers/main/model_doc/llama
下载我们的delta模型https://huggingface.co/YuxinJiang/Lion
使用以下脚本通过应用我们的delta来获得Lion权重：

python src/weight_diff.py recover --path_raw huggyllama/llama-7b --path_diff YuxinJiang/Lion --path_tuned <path_to_store_recovered_weights>

推理

对于Lion的推理和训练，请首先安装以下依赖：

pip install -r requirements.txt

我们为Lion提供了解码脚本，它读取输入文件并为每个样本生成相应的响应，最后将它们合并到输出文件中它可以在16GB GPU的单机上运行。

python src/lion_inference.py \
    --model_dir <path_to_hf_converted_lion_ckpt_and_tokenizer> \
    --data_dir <path_to_input_json_file> \
    --output_dir <path_to_output_json_file> \
    --num_gpus 1

训练

下面展示的是我们对抗蒸馏框架的一个迭代:

1. 模仿阶段

1.1 在训练池上获取老师LLM的回答

python src/chatgpt_inference.py \
    -q <path_to_json_file_for_the_Train_Pool> \
    -o <path_to_chatgpt_inference_for_the_Train_Pool> \
    --api_key <your_openai_api_key>

1.2 根据教师对训练池的反应调整学生的教学

在配有8个A100 80G GPU的机器上进行微调。

torchrun --nproc_per_node=8 --master_port=<your_random_port> src/train.py \
    --model_name_or_path <path_to_hf_converted_ckpt_and_tokenizer> \
    --data_path <path_to_chatgpt_inference_for_the_Train_Pool> \
    --bf16 True \
    --output_dir result \
    --num_train_epochs 3 \
    --model_max_length 1024 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 500 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True

解决OOM问题

简单地说，微调7B模型需要大约7 x 8 x 2=112 GB的显存。上面给出的命令启用参数分片，因此任何GPU上都不会存储冗余的模型副本。如果您想进一步减少显存占用，以下是一些选项：

使用`–FSDP“full_shard auto_wrap offload”`打开FSDP的CPU卸载。这以更长的运行时间为代价节省了显存。

根据我们的经验，DeepSpeed第3阶段（带卸载）有时比带卸载的FSDP更具显存效率。下面是一个使用DeepSpeed stage-3和8个GPU的示例，其中包含参数和优化器卸载：

deepspeed src/train_deepspeed.py \
      --model_name_or_path <path_to_hf_converted_ckpt_and_tokenizer> \
      --data_path <path_to_chatgpt_inference_for_the_Train_Pool> \
      --output_dir result \
      --num_train_epochs 3 \
      --model_max_length 1024 \
      --per_device_train_batch_size 16 \
      --per_device_eval_batch_size 1 \
      --gradient_accumulation_steps 1 \
      --evaluation_strategy "no" \
      --save_strategy "steps" \
      --save_steps 600 \
      --save_total_limit 1 \
      --learning_rate 2e-5 \
      --warmup_ratio 0.03 \
      --logging_steps 1 \
      --lr_scheduler_type "cosine" \
      --report_to "tensorboard" \
      --gradient_checkpointing True \
      --deepspeed srcs/configs/deepspeed_config.json \
      --fp16 True

DeepSpeed库还提供了一些[有用的功能](https://deepspeed.readthedocs.io/en/latest/memory.html)以估计显存使用情况。

[LoRA](https://arxiv.org/abs/2106.09685)微调查询、键和值嵌入头的低阶切片。这可以将总显存占用空间从112GB减少到约7x4=28GB。我们可能会在未来发布我们对此的重新实施，但就目前而言(https://github.com/huggingface/peft)代码库可能是一种有用的资源。

2. 区分阶段

2.1 获取教师在缓存池上的响应

python src/chatgpt_inference.py \
    -q <path_to_json_file_for_the_Cache_Pool> \
    -o <path_to_chatgpt_inference_for_the_Cache_Pool> \
    --api_key <your_openai_api_key>

2.2 获取学生对缓存池的响应

python src/lion_inference.py \
    --model_dir <path_to_hf_converted_lion_ckpt_and_tokenizer> \
    --data_dir <path_to_json_file_for_the_Cache_Pool> \
    --output_dir <path_to_lion_inference_for_the_Cache_Pool> \
    --num_gpus 8

2.3要求裁判根据老师和学生的表现质量输出两个得分数

python src/chatgpt_referee.py \
    -a <path_to_chatgpt_inference_for_the_Cache_Pool> <path_to_lion_inference_for_the_Cache_Pool> \
    -o <path_to_output_review_file> \
    --api_key <your_openai_api_key>

2.4 区分困难指令和简单指令

python src/discrimination.py \
    --review_path <path_to_output_review_file> \
    --chatgpt_inference_path <path_to_chatgpt_inference_for_the_Cache_Pool> \
    --lion_inference_path <path_to_lion_inference_for_the_Cache_Pool> \
    --hard_save_path <path_to_identified_hard_instructions> \
    --easy_save_path <path_to_identified_easy_instructions>

3. 生成阶段

3.1 生成新的困难指令

python -m src/generate_hard_instruction generate_instruction_following_data \
    --seed_tasks_path <path_to_identified_hard_instructions> \
    --output_dir <path_to_generated_hard_instructions> \
    --num_instructions_to_generate 3000 \
    --api_key <your_openai_api_key>

3.2 生成新的简单指令

python -m src/generate_easy_instruction generate_instruction_following_data \
    --seed_tasks_path <path_to_identified_easy_instructions> \
    --output_dir <path_to_generated_easy_instructions> \
    --num_instructions_to_generate 3000 \
    --api_key <your_openai_api_key>

实****验结果

为了验证方法的有效性，作者将提出的对抗蒸馏框架应用于知名的闭源大语言模型 ChatGPT，将其知识转移到一个开源的基础预训练模型 LLaMA，该模型由 70 亿参数组成。作者选择了 Alpaca 的训练数据（仅由 175 个手动选择的种子指令生成）作为初始的训练指令，并进行了 3 次 AKD 迭代，总共使用了 70K 的 instruction-following 数据进行训练。最终训练好的模型被命名为 Lion。