argument
class glmtuner.hparams.ModelArguments <source code>
- model_name_or_path (str, optional ): Path to pretrained model or model identifier in huggingface.co/models . default:
THUDM/chatglm-6b
- config_name (str, optional ): Pretrained configuration name or path (if different from model_name). default:
None
- tokenizer_name (str, optional ): pretrained tokenizer name or path (if different from model_name). default:
None
- cache_dir (str, optional ): Location to store pretrained models downloaded from Huggingface.co . default:
None
- use_fast_tokenizer (bool, optional ): Whether to use one of the fast tokenizers (supported by the tokenizer library). default:
True
- model_revision (str, optional ): The specific model revision to use (can be a branch name, tag name, or commit id). default:
main
- use_auth_token (str, optional ): The token generated at runtime will be used
huggingface-cli login
. default:False
- quantization_bit (int, optional ): The number of bits to quantize the model. default:
None
- quantization_type (str, optional ): int4 Quantization data type used in training. default:
nf4
- double_quantization (bool, optional ): Whether to use double quantization in int4 training . default:
True
- checkpoint_dir (str, optional ): Path to directory containing model checkpoints along with configuration. default:
None
- reward_model (str, optional ): Path to a directory containing checkpoints for the reward model. default:
None
- resume_lora_training (bool, optional ): Whether to resume training from the last LoRA weights, or create new ones after merging them. default:
True
- plot_loss (bool, optional ): Whether to plot the fine-tuned training loss. default:
False
class glmtuner.hparams.DataArguments <source>
- dataset (str, optional ): The name of the provided dataset to use. Separate multiple datasets with commas. default:
alpaca_zh
- dataset_dir (str, optional ): The name of the folder containing the dataset. default:
data
- split (str, optional ): Dataset split for training and evaluation. default:
train
- overwrite_cache (bool, optional ): Overwrite cached training and evaluation sets. default:
False
- preprocessing_num_workers (int, optional ): Number of processes to use for preprocessing. default:
None
- max_source_length (int, optional ): Maximum total input sequence length after tokenization. default:
512
- max_target_length (int, optional ): Maximum total output sequence length after tokenization. default:
512
- max_samples (int, optional ): Number of samples to truncate each dataset for debugging purposes. default:
None
- eval_num_beams (int, optional ): Number of beams to use for evaluation. The parameter will be passed to
model.generate
. Default:None
- ignore_pad_token_for_loss (bool, optional ): Whether to ignore tokens corresponding to pad labels in loss calculation. default:
True
- source_prefix (str, optional ): Prefix to add before each source text (useful for T5 models). default:
None
- dev_ratio (float, optional ): Ratio of the dataset included in the dev set, should be between 0.0 and 1.0. default:
0
class glmtuner.hparams.FinetuningArguments <source> ¶
- finetuning_type (str, optional ): Which finetuning method to use for training. default:
lora
- num_layer_trainable (int, optional ): Number of trainable layers for frozen fine-tuning. default:
3
- name_module_trainable (int, optional ): The name of the trainable module for frozen fine-tuning. default:
mlp
- pre_seq_len (int, optional ): Number of prefix markers to use for P-tuning v2. default:
64
- prefix_projection (bool, optional ): Whether to add project layers for prefixes in P-tuning v2. default:
False
- lora_rank (int, optional ): Intrinsic dimension for LoRA fine-tuning. default:
8
- lora_alpha (float, optional ): Scale factor for LoRA fine-tuning. (similar to learning rate) Default:
32.0
- lora_dropout (float, optional ): Dropout rate for LoRA fine-tuning. default:
0.1
- lora_target (str, optional ): The name of the target module to apply LoRA to. Separate multiple modules with commas. default:
query_key_value
class Transformers.Seq2SeqTrainingArguments <source>
We only list some important parameters, see HuggingFace Docs for a complete list .
- output_dir (str): The output directory where model predictions and checkpoints will be written.
- overwrite_output_dir (bool, optional ): If True, overwrite the contents of the output directory. If output_dir points to a checkpoint directory, use it to continue training. default:
False
- do_train (bool, optional ): Whether to run training. default:
False
- do_eval (bool, optional ): Whether to run evaluation. default:
False
- do_predict (bool, optional ): Whether to run prediction. default:
False
- per_device_train_batch_size (int, optional ): Batch size per GPU/TPU core/CPU used for training. default:
8
- per_device_eval_batch_size (int, optional ): Batch size per GPU/TPU core/CPU for evaluation or prediction. default:
8
- gradient_accumulation_steps (int, optional ): Number of update steps to accumulate gradients before performing the backward/update pass. default:
1
- Learning_rate (float, optional): Initial learning rate for the AdamW optimizer. default:
5e-5
- weight_decay (float, optional ): Weight decay to apply to all layers except all bias and LayerNorm weights in the AdamW optimizer (if non-zero). default:
0.0
- max_grad_norm (float, optional ): Maximum gradient norm (for gradient clipping ). default:
1.0
- num_train_epochs (float, optional ): Total number of training epochs to execute (if not an integer, a fractional percentage of the last epoch before stopping training will be executed). default:
3.0
- logging_steps (int, optional ): Number of update steps between two logs. default:
500
- save_steps (int, optional ): Number of update steps before two checkpoint saves. default:
500
- no_cuda (bool, optional ): Whether to not use CUDA, even if it is available or not. default:
False
- fp16 (bool, optional ): Whether to use fp16 16-bit (mixed) precision training instead of 32-bit training. default:
False
- Predict_with_generate (bool, optional ): Whether to use generate to calculate generation metrics (ROUGE, BLEU) . default:
False