Transformers pre-load training model | seven

Author | huggingface compiled | VK source | Github

Pre-loaded Google AI or OpenAI weight training or PyTorch dump

from_pretrained()method

To load the Google AI, saved OpenAI pre-training model or PyTorch model (with torch.save()the saved BertForPreTraininginstance), PyTorch model and tokenizer class can be from_pretrained()instantiated:

model = BERT_CLASS.from_pretrained(PRE_TRAINED_MODEL_NAME_OR_PATH, cache_dir=None, from_tf=False, state_dict=None, *input, **kwargs)

among them

  • BERT_CLASSEither tokenizer (used to load the vocabulary BertTokenizeror the OpenAIGPTTokenizerlike), either loaded or eight BERT one GPT PyTorch model class three OpenAI (for loading pre-training weights): BertModel, BertForMaskedLM, BertForNextSentencePrediction, BertForPreTraining, BertForSequenceClassification, BertForTokenClassification, BertForMultipleChoice, BertForQuestionAnswering, OpenAIGPTModel, OpenAIGPTLMHeadModelorOpenAIGPTDoubleHeadsModel

  • PRE_TRAINED_MODEL_NAME_OR_PATHfor:

    • Google AI or predefined OpenAI quick list of names, which models are already trained model:

      • bert-base-uncased: 12 layers, hidden nodes 768, 12 heads, 110M amount parameter.
      • bert-large-uncased: 24 layers, hidden nodes 1024, 16 heads, 340M amount parameter.
      • bert-base-cased: 12 layers, hidden nodes 768, 12 heads, 110M amount parameter.
      • bert-large-cased: 24 layers, hidden nodes 1024, 16 heads, 340M amount parameter.
      • bert-base-multilingual-uncased:( original, not recommended) 12 layer, 768 hidden nodes, 12 heads, 110M parameter amount.
      • bert-base-multilingual-cased:( new, recommended) 12 layer, 768 hidden nodes, 12 heads, 110M parameter amount.
      • bert-base-chinese: Simplified Chinese and Traditional Chinese, layer 12, hidden nodes 768, 12 heads, 110M amount parameter.
      • bert-base-german-cased: Training for German data, 12 layers, 768 hidden nodes, 12 heads, 110M quantity parameters only. Performance Evaluation ( https://deepset.ai/german-bert )
      • bert-large-uncased-whole-word-masking: 24 layers, hidden nodes 1024, 16 heads, 340M amount parameter. After Whole Word Masking training mode (corresponding to the word mark all masking process)
      • bert-large-cased-whole-word-masking: 24 layers, hidden nodes 1024, 16 heads, 340M amount parameter. After Whole Word Masking training mode (corresponding to the word mark all masking process)
      • bert-large-uncased-whole-word-masking-finetuned-squad: On SQuAD fine-tuning bert-large-uncased-whole-word-maskingmodel (use run_bert_squad.py). The results: EXACT_MATCH: 86.91579943235573, f1: 93.1532499015869
      • bert-base-german-dbmdz-cased: Training for German data, 12 layers, 768 hidden nodes, 12 heads, 110M quantity parameters only. Performance Evaluation ( https://deepset.ai/german-bert )
      • bert-base-german-dbmdz-uncased: For German data (not case sensitive), 12 layers, 768 hidden nodes, 12 heads, 110M quantity parameters only. Performance Evaluation ( https://github.com/dbmdz/german-bert )
      • openai-gpt: OpenAI GPT English model, layer 12, hidden nodes 768, 12 heads, 110M amount parameter.
      • gpt2: OpenAI GPT-2 English model, layer 12, hidden nodes 768, 12 heads, 117M amount parameter.
        • gpt2-medium: OpenAI GPT-2 English model, layer 24, hidden nodes 1024, 16 heads, 345M amount parameter.
      • transfo-xl-wt103: Transformer-XL model using the English model training in the -103 wikitext of 24 layers, 1024 hidden nodes, 16 heads, 257M parameter amount.
    • A path or URL contains a pre-training model:

      • bert_config.jsonOr openai_gpt_config.jsonfor the profile model
      • pytorch_model.binIt is BertForPreTrainingsaved OpenAIGPTModel, TransfoXLModeland GPT2LMHeadModelthe pre-training examples PyTorch dump. (Using the usual torch.save()save)

    If PRE_TRAINED_MODEL_NAME_OR_PATHa shortcut name, then download the pre-training weights from AWS S3. Can see the link ( https://github.com/huggingface/transformers/blob/master/transformers/modeling_bert.py) need to be downloaded and stored in the cache folder later to avoid (can be found in `~ / .pytorch_pretrained_bert /` in the cache folder).

    • cache_dirIt may be optional path to a specific directory to download and cache the pre-trained model weights. This option is particularly useful when using distributed training: To avoid simultaneous access to the same heavy weights, you can set the example cache_dir='./pretrained_model_{}'.format(args.local_rank). ).

    • from_tf : We should re-loaded from the checkpoint right TensorFlow saved locally

    • state_dict : Optional state dictionary (collections.OrderedDict objects), rather than using Google's pre-training mode

    • *inputs, : ** kwargsBert additional input specific class (e.g.: BertForSequenceClassification of num_labels)

UncasedRepresents prior WordPiece labeled, lower case text has been, for example, John Smithbecomes john smith. Uncased model also removes any accent mark. CasedHe expressed reservations about the true case and accent marks. In general, unless you know the case information is important for your task (for example, named entity recognition or speech marks), otherwise the Uncasedmodel will be better. For information on multi-language and Chinese model, see ( https://github.com/google-research/bert/blob/master/multilingual.md) or original TensorFlow repository.

When Uncasedthe time of the model, be sure to pass --do_lower_case example training scripts (if you use your own script, then do_lower_case=Truepassed to FullTokenizer)).

Example:

# BERT
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True, do_basic_tokenize=True)
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

# OpenAI GPT
tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')
model = OpenAIGPTModel.from_pretrained('openai-gpt')

# Transformer-XL
tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
model = TransfoXLModel.from_pretrained('transfo-xl-wt103')

# OpenAI GPT-2
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')

Cache directory

pytorch_pretrained_bertThe pre-training weights stored in the cache directory (located in this priority):

  • cache_dirIs an from_pretrained()optional parameter method (see above),
  • shell environment variables PYTORCH_PRETRAINED_BERT_CACHE,
  • + PyTorch cache directory /pytorch_pretrained_bert/, the cache directory where PyTorch (defined in this order) of:
    • Shell environment variableENV_TORCH_HOME
    • shell environment variables ENV_XDG_CACHE_HOME+ /torch/)
    • Defaults:~/.cache/torch/

In general, any specific environment variables if you have not set pytorch_pretrained_bertthe cache will be located ~/.cache/torch/pytorch_pretrained_bert/in.

You can always safely remove pytorch_pretrained_bertthe cache, but must be downloaded again pre-training model weights and vocabulary files from our S3.

Original link: https://huggingface.co/transformers/serialization.html

AI welcomes the attention Pan Chong station blog: http://panchuang.net/

OpenCV Chinese official document: http://woshicver.com/

Welcome attention Pan Chong blog resources Summary station: http://docs.panchuang.net/

Published 372 original articles · won praise 1063 · Views 670,000 +

Guess you like

Origin blog.csdn.net/fendouaini/article/details/105254397