ChatGLM-Efficient-Tuning parameter analysis glmtuner.hparams.ModelArguments

argument

class glmtuner.hparams.ModelArguments  <source code>

  • model_name_or_path  (str, optional ): Path to pretrained model or model identifier in huggingface.co/models . default:THUDM/chatglm-6b
  • config_name (str, optional ): Pretrained configuration name or path (if different from model_name). default:None
  • tokenizer_name (str, optional ): pretrained tokenizer name or path (if different from model_name). default:None
  • cache_dir  (str, optional ): Location to store pretrained models downloaded from Huggingface.co . default:None
  • use_fast_tokenizer  (bool, optional ): Whether to use one of the fast tokenizers (supported by the tokenizer library). default:True
  • model_revision  (str, optional ): The specific model revision to use (can be a branch name, tag name, or commit id). default:main
  • use_auth_token  (str, optional ): The token generated at runtime will be used huggingface-cli login. default:False
  • quantization_bit (int, optional ): The number of bits to quantize the model. default:None
  • quantization_type (str, optional ): int4 Quantization data type used in training. default:nf4
  • double_quantization  (bool, optional ): Whether to use double quantization in int4 training . default:True
  • checkpoint_dir (str, optional ): Path to directory containing model checkpoints along with configuration. default:None
  • reward_model (str, optional ): Path to a directory containing checkpoints for the reward model. default:None
  • resume_lora_training  (bool, optional ): Whether to resume training from the last LoRA weights, or create new ones after merging them. default:True
  • plot_loss  (bool, optional ): Whether to plot the fine-tuned training loss. default:False

class glmtuner.hparams.DataArguments  <source>

  • dataset (str, optional ): The name of the provided dataset to use. Separate multiple datasets with commas. default:alpaca_zh
  • dataset_dir (str, optional ): The name of the folder containing the dataset. default:data
  • split (str, optional ): Dataset split for training and evaluation. default:train
  • overwrite_cache  (bool, optional ): Overwrite cached training and evaluation sets. default:False
  • preprocessing_num_workers (int, optional ): Number of processes to use for preprocessing. default:None
  • max_source_length (int, optional ): Maximum total input sequence length after tokenization. default:512
  • max_target_length (int, optional ): Maximum total output sequence length after tokenization. default:512
  • max_samples (int, optional ): Number of samples to truncate each dataset for debugging purposes. default:None
  • eval_num_beams (int, optional ): Number of beams to use for evaluation. The parameter will be passed to model.generate. Default:None
  • ignore_pad_token_for_loss  (bool, optional ): Whether to ignore tokens corresponding to pad labels in loss calculation. default:True
  • source_prefix (str, optional ): Prefix to add before each source text (useful for T5 models). default:None
  • dev_ratio (float, optional ): Ratio of the dataset included in the dev set, should be between 0.0 and 1.0. default:0

class glmtuner.hparams.FinetuningArguments <source> 

  • finetuning_type (str, optional ): Which finetuning method to use for training. default:lora
  • num_layer_trainable (int, optional ): Number of trainable layers for frozen fine-tuning. default:3
  • name_module_trainable (int, optional ): The name of the trainable module for frozen fine-tuning. default:mlp
  • pre_seq_len (int, optional ): Number of prefix markers to use for P-tuning v2. default:64
  • prefix_projection  (bool, optional ): Whether to add project layers for prefixes in P-tuning v2. default:False
  • lora_rank (int, optional ): Intrinsic dimension for LoRA fine-tuning. default:8
  • lora_alpha  (float, optional ): Scale factor for LoRA fine-tuning. (similar to learning rate) Default:32.0
  • lora_dropout (float, optional ): Dropout rate for LoRA fine-tuning. default:0.1
  • lora_target (str, optional ): The name of the target module to apply LoRA to. Separate multiple modules with commas. default:query_key_value

class Transformers.Seq2SeqTrainingArguments  <source>

We only list some important parameters, see HuggingFace Docs for a complete list .

  • output_dir  (str): The output directory where model predictions and checkpoints will be written.
  • overwrite_output_dir  (bool, optional ): If True, overwrite the contents of the output directory. If output_dir points to a checkpoint directory, use it to continue training. default:False
  • do_train  (bool, optional ): Whether to run training. default:False
  • do_eval  (bool, optional ): Whether to run evaluation. default:False
  • do_predict (bool, optional ): Whether to run prediction. default:False
  • per_device_train_batch_size (int, optional ): Batch size per GPU/TPU core/CPU used for training. default:8
  • per_device_eval_batch_size (int, optional ): Batch size per GPU/TPU core/CPU for evaluation or prediction. default:8
  • gradient_accumulation_steps (int, optional ): Number of update steps to accumulate gradients before performing the backward/update pass. default:1
  • Learning_rate (float, optional):  Initial learning rate for the AdamW optimizer. default:5e-5
  • weight_decay (float, optional ): Weight decay to apply to all layers except all bias and LayerNorm weights in the AdamW optimizer (if non-zero). default:0.0
  • max_grad_norm (float, optional ): Maximum gradient norm (for gradient clipping ). default:1.0
  • num_train_epochs (float, optional ): Total number of training epochs to execute (if not an integer, a fractional percentage of the last epoch before stopping training will be executed). default:3.0
  • logging_steps (int, optional ): Number of update steps between two logs. default:500
  • save_steps (int, optional ): Number of update steps before two checkpoint saves. default:500
  • no_cuda  (bool, optional ): Whether to not use CUDA, even if it is available or not. default:False
  • fp16 (bool, optional ): Whether to use fp16 16-bit (mixed) precision training instead of 32-bit training. default:False
  • Predict_with_generate  (bool, optional ): Whether to use generate to calculate generation metrics (ROUGE, BLEU) . default:False

Guess you like

Origin blog.csdn.net/sinat_37574187/article/details/131984244