ChatGLM ptuning-v2 training parameter analysis

 

PRE_SEQ_LEN=128
LR=2e-2

CUDA_VISIBLE_DEVICES=0 python3 main.py \
    --do_train \
    --train_file AdvertiseGen/train.json \
    --validation_file AdvertiseGen/dev.json \
    --prompt_column content \
    --response_column summary \
    --overwrite_cache \
    --model_name_or_path THUDM/chatglm-6b \
    --output_dir output/adgen-chatglm-6b-pt-$PRE_SEQ_LEN-$LR \
    --overwrite_output_dir \
    --max_source_length 64 \
    --max_target_length 64 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --predict_with_generate \
    --max_steps 3000 \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate $LR \
    --pre_seq_len $PRE_SEQ_LEN \
    --quantization_bit 4

This code is a command-line script that executes a Python program called "main.py". Here is an explanation of the code line by line:

  1. PRE_SEQ_LEN=128: defines a PRE_SEQ_LENvariable called , and sets it to 128. The role of this variable will be used in the subsequent code.

  2. LR=2e-2: defines a LRvariable called , and sets it to 2e-2, which is 0.02. This variable represents the learning rate, which will be used in the subsequent code.

  3. CUDA_VISIBLE_DEVICES=0 python3 main.py \: Set the environment variable of the CUDA visible device , and set its value to 0, which means to use the first visible GPU device. Then run the Python program named "main.py" using the Python 3 interpreter. \Indicates that the command continues to the next line.

  4. --do_train \: A command-line parameter that instructs the program to perform the training task.

  5. --train_file AdvertiseGen/train.json \: Specify the path and file name of the training data file as "AdvertiseGen/train.json".

  6. --validation_file AdvertiseGen/dev.json \: Specify the path and file name of the validation data file as "AdvertiseGen/dev.json".

  7. --prompt_column content \: Specify the column name "content" in the input data as a prompt.

  8. --response_column summary \: Specify the column name "summary" in the input data as the response.

  9. --overwrite_cache \: A command-line argument that instructs to overwrite the cache if it exists.

  10. --model_name_or_path THUDM/chatglm-6b \: Specifies the name or path of the model to use as "THUDM/chatglm-6b".

  11. --output_dir output/adgen-chatglm-6b-pt-$PRE_SEQ_LEN-$LR \: Specifies the path and name of the output directory as "output/adgen-chatglm-6b-pt-$PRE_SEQ_LEN-$LR". This is where training results and logs are saved.

  12. --overwrite_output_dir \: A command-line argument that instructs to overwrite the output directory if it exists.

  13. --max_source_length 64 \: Specifies that the maximum length of the input sequence is 64.

  14. --max_target_length 64 \: Specifies that the maximum length of the output sequence is 64.

  15. --per_device_train_batch_size 1 \: Specifies a training batch size of 1 per training device.

  16. --per_device_eval_batch_size 1 \: Specifies an evaluation batch size of 1 per evaluation device.

  17. --gradient_accumulation_steps 16 \: Specify the number of steps for gradient accumulation to be 16. Before each update step, a certain number of gradients are computed and accumulated.

  18. --predict_with_generate \: A command-line argument indicating the generation mode to use when generating predictions for the model.

  19. --max_steps 3000 \: Specifies that the maximum number of training steps is 3000.

  20. --logging_steps 10 \: Specifies to log every 10 steps.

  21. --save_steps 1000 \: Specifies to save the model every 1000 steps.

  22. --learning_rate $LR \LR: Specifies the value of the learning rate variable defined earlier .

  23. --pre_seq_len $PRE_SEQ_LEN \: Specifies the preset sequence length as the value of the previously defined PRE_SEQ_LENvariable.

  24. --quantization_bit 4 \: Specify the number of quantization digits to be 4. This parameter may be a model-specific setting.

In general, this code is to perform the training task of a chat generation model, using the specified dataset, model and hyperparameter settings.

  1. The first PRE_SEQ_LEN=128and second lines of code LR=2e-2are used to define variables. PRE_SEQ_LENThe value of is set to 128 and LRthe value of is set to 0.02.

  2. The third line CUDA_VISIBLE_DEVICES=0 python3 main.py \is a command-line command that sets the visible GPU device to device number 0 and runs a Python program named "main.py" using the Python 3 interpreter. This command uses a backslash \to indicate that the command continues on the next line.

  3. The next few lines are command-line parameters, which are used to pass configuration information and data paths to the program. These parameters include:

  • --do_train: Instruct the program to execute the training task.

  • --train_file AdvertiseGen/train.json: Specify the path and file name of the training data file as "AdvertiseGen/train.json".

  • --validation_file AdvertiseGen/dev.json: Specify the path and file name of the validation data file as "AdvertiseGen/dev.json".

  • --prompt_column content: Specify the column name "content" in the input data as a prompt.

  • --response_column summary: Specify the column name "summary" in the input data as the response.

  • --overwrite_cache: Indicates to overwrite the cache if it exists.

  • --model_name_or_path THUDM/chatglm-6b: Specifies the name or path of the model to use as "THUDM/chatglm-6b".

  • --output_dir output/adgen-chatglm-6b-pt-$PRE_SEQ_LEN-$LR: Specifies the path and name of the output directory as "output/adgen-chatglm-6b-pt-$PRE_SEQ_LEN-$LR", where and $PRE_SEQ_LENare $LRthe values ​​of the previously defined variables.

  • --overwrite_output_dir: Indicates to overwrite the output directory if it exists.

  • --max_source_length 64: Specifies that the maximum length of the input sequence is 64.

  • --max_target_length 64: Specifies that the maximum length of the output sequence is 64.

  • --per_device_train_batch_size 1: Specifies a training batch size of 1 for each training device.

  • --per_device_eval_batch_size 1: Specifies an evaluation batch size of 1 for each evaluation device.

  • --gradient_accumulation_steps 16: Specify the number of steps for gradient accumulation to be 16. Before each update step, a certain number of gradients are computed and accumulated.

  • --predict_with_generate: Indicates that generate mode is used when generating predictions for the model.

  • --max_steps 3000: Specifies that the maximum number of training steps is 3000.

  • --logging_steps 10: Specifies to record a log every 10 steps.

  • --save_steps 1000: Specifies to save the model every 1000 steps.

  • --learning_rate $LRLR: Specifies the value of the learning rate variable defined earlier .

  • --pre_seq_len $PRE_SEQ_LEN: Specifies the preset sequence length as the value of the previously defined PRE_SEQ_LENvariable.

  • --quantization_bit 4: Specify the number of quantization digits to be 4.

These parameters will be passed to a Python program called "main.py" to configure the behavior and settings of the training task.

Regenerate response

Guess you like

Origin blog.csdn.net/sinat_37574187/article/details/131401654