Author | huggingface compiled | VK source | Github
Pre-loaded Google AI or OpenAI weight training or PyTorch dump
from_pretrained()
method
To load the Google AI, saved OpenAI pre-training model or PyTorch model (with torch.save()
the saved BertForPreTraining
instance), PyTorch model and tokenizer class can be from_pretrained()
instantiated:
model = BERT_CLASS.from_pretrained(PRE_TRAINED_MODEL_NAME_OR_PATH, cache_dir=None, from_tf=False, state_dict=None, *input, **kwargs)
among them
BERT_CLASS
Either tokenizer (used to load the vocabularyBertTokenizer
or theOpenAIGPTTokenizer
like), either loaded or eight BERT one GPT PyTorch model class three OpenAI (for loading pre-training weights):BertModel
,BertForMaskedLM
,BertForNextSentencePrediction
,BertForPreTraining
,BertForSequenceClassification
,BertForTokenClassification
,BertForMultipleChoice
,BertForQuestionAnswering
,OpenAIGPTModel
,OpenAIGPTLMHeadModel
orOpenAIGPTDoubleHeadsModel
PRE_TRAINED_MODEL_NAME_OR_PATH
for:Google AI or predefined OpenAI quick list of names, which models are already trained model:
bert-base-uncased
: 12 layers, hidden nodes 768, 12 heads, 110M amount parameter.bert-large-uncased
: 24 layers, hidden nodes 1024, 16 heads, 340M amount parameter.bert-base-cased
: 12 layers, hidden nodes 768, 12 heads, 110M amount parameter.bert-large-cased
: 24 layers, hidden nodes 1024, 16 heads, 340M amount parameter.bert-base-multilingual-uncased
:( original, not recommended) 12 layer, 768 hidden nodes, 12 heads, 110M parameter amount.bert-base-multilingual-cased
:( new, recommended) 12 layer, 768 hidden nodes, 12 heads, 110M parameter amount.bert-base-chinese
: Simplified Chinese and Traditional Chinese, layer 12, hidden nodes 768, 12 heads, 110M amount parameter.bert-base-german-cased
: Training for German data, 12 layers, 768 hidden nodes, 12 heads, 110M quantity parameters only. Performance Evaluation ( https://deepset.ai/german-bert )bert-large-uncased-whole-word-masking
: 24 layers, hidden nodes 1024, 16 heads, 340M amount parameter. After Whole Word Masking training mode (corresponding to the word mark all masking process)bert-large-cased-whole-word-masking
: 24 layers, hidden nodes 1024, 16 heads, 340M amount parameter. After Whole Word Masking training mode (corresponding to the word mark all masking process)bert-large-uncased-whole-word-masking-finetuned-squad
: On SQuAD fine-tuningbert-large-uncased-whole-word-masking
model (userun_bert_squad.py
). The results: EXACT_MATCH: 86.91579943235573, f1: 93.1532499015869bert-base-german-dbmdz-cased
: Training for German data, 12 layers, 768 hidden nodes, 12 heads, 110M quantity parameters only. Performance Evaluation ( https://deepset.ai/german-bert )bert-base-german-dbmdz-uncased
: For German data (not case sensitive), 12 layers, 768 hidden nodes, 12 heads, 110M quantity parameters only. Performance Evaluation ( https://github.com/dbmdz/german-bert )openai-gpt
: OpenAI GPT English model, layer 12, hidden nodes 768, 12 heads, 110M amount parameter.gpt2
: OpenAI GPT-2 English model, layer 12, hidden nodes 768, 12 heads, 117M amount parameter.gpt2-medium
: OpenAI GPT-2 English model, layer 24, hidden nodes 1024, 16 heads, 345M amount parameter.
transfo-xl-wt103
: Transformer-XL model using the English model training in the -103 wikitext of 24 layers, 1024 hidden nodes, 16 heads, 257M parameter amount.
A path or URL contains a pre-training model:
bert_config.json
Oropenai_gpt_config.json
for the profile modelpytorch_model.bin
It isBertForPreTraining
savedOpenAIGPTModel
,TransfoXLModel
andGPT2LMHeadModel
the pre-training examples PyTorch dump. (Using the usualtorch.save()
save)
If
PRE_TRAINED_MODEL_NAME_OR_PATH
a shortcut name, then download the pre-training weights from AWS S3. Can see the link ( https://github.com/huggingface/transformers/blob/master/transformers/modeling_bert.py) need to be downloaded and stored in the cache folder later to avoid (can be found in `~ / .pytorch_pretrained_bert /` in the cache folder).cache_dir
It may be optional path to a specific directory to download and cache the pre-trained model weights. This option is particularly useful when using distributed training: To avoid simultaneous access to the same heavy weights, you can set the examplecache_dir='./pretrained_model_{}'.format(args.local_rank)
. ).from_tf
: We should re-loaded from the checkpoint right TensorFlow saved locallystate_dict
: Optional state dictionary (collections.OrderedDict objects), rather than using Google's pre-training mode*inputs
, :** kwargs
Bert additional input specific class (e.g.: BertForSequenceClassification of num_labels)
Uncased
Represents prior WordPiece labeled, lower case text has been, for example, John Smith
becomes john smith
. Uncased model also removes any accent mark. Cased
He expressed reservations about the true case and accent marks. In general, unless you know the case information is important for your task (for example, named entity recognition or speech marks), otherwise the Uncased
model will be better. For information on multi-language and Chinese model, see ( https://github.com/google-research/bert/blob/master/multilingual.md) or original TensorFlow repository.
When Uncased
the time of the model, be sure to pass --do_lower_case example training scripts (if you use your own script, then do_lower_case=True
passed to FullTokenizer)).
Example:
# BERT
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True, do_basic_tokenize=True)
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
# OpenAI GPT
tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')
model = OpenAIGPTModel.from_pretrained('openai-gpt')
# Transformer-XL
tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
model = TransfoXLModel.from_pretrained('transfo-xl-wt103')
# OpenAI GPT-2
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
Cache directory
pytorch_pretrained_bert
The pre-training weights stored in the cache directory (located in this priority):
cache_dir
Is anfrom_pretrained()
optional parameter method (see above),- shell environment variables
PYTORCH_PRETRAINED_BERT_CACHE
, - + PyTorch cache directory
/pytorch_pretrained_bert/
, the cache directory where PyTorch (defined in this order) of:- Shell environment variable
ENV_TORCH_HOME
- shell environment variables
ENV_XDG_CACHE_HOME
+/torch/
) - Defaults:
~/.cache/torch/
- Shell environment variable
In general, any specific environment variables if you have not set pytorch_pretrained_bert
the cache will be located ~/.cache/torch/pytorch_pretrained_bert/
in.
You can always safely remove pytorch_pretrained_bert
the cache, but must be downloaded again pre-training model weights and vocabulary files from our S3.
Original link: https://huggingface.co/transformers/serialization.html
AI welcomes the attention Pan Chong station blog: http://panchuang.net/
OpenCV Chinese official document: http://woshicver.com/
Welcome attention Pan Chong blog resources Summary station: http://docs.panchuang.net/