BERT-Bidirectional Encoder Representations from Transformers

版权声明:本文为博主原创文章,转载请注明出处:http://blog.csdn.net/shine19930820 https://blog.csdn.net/shine19930820/article/details/83795586

BERT, or Bidirectional Encoder Representations from Transformers

BERT是google最新提出的NLP预训练方法,在大型文本语料库(如维基百科)上训练通用的“语言理解”模型,然后将该模型用于我们关心的下游NLP任务(如分类、阅读理解)。 BERT优于以前的方法,因为它是用于预训练NLP的第一个**无监督深度双向**系统。

简单的说就是吊打以前的模型,例如 Semi-supervised Sequence Learning,Generative Pre-Training,ELMo, ULMFit,在多个语言任务上(SQuAD, MultiNLI, and MRPC)基于BERT的模型都取得了state of the art的效果。

BERT 的核心过程:

  1. 从句子中随机选取15%去除,作为模型预测目标,例如:

    Input: the man went to the [MASK1] . he bought a [MASK2] of milk.
    Labels: [MASK1] = store; [MASK2] = gallon
    
  2. 为了学习句子之间的关系。会从数据集抽取两个句子,其中第二句是第一句的下一句的概率是 50%,

    Sentence A: the man went to the store .
    Sentence B: he bought a gallon of milk .
    Label: IsNextSentence
    
    Sentence A: the man went to the store .
    Sentence B: penguins are flightless .
    Label: NotNextSentence
    
  3. 最后再将经过处理的句子传入大型 Transformer 模型,并通过两个损失函数同时学习上面两个目标就能完成训练。

主要在于Transformer模型。后续需要再分析其模型机构以及设计思想。

预训练模型

  • BERT-Base, Uncased:
    12-layer, 768-hidden, 12-heads, 110M parameters
  • BERT-Large, Uncased:
    24-layer, 1024-hidden, 16-heads, 340M parameters
  • BERT-Base, Cased:
    12-layer, 768-hidden, 12-heads , 110M parameters
  • BERT-Large, Cased: 24-layer, 1024-hidden, 16-heads, 340M parameters
    (Not available yet. Needs to be re-generated).
  • BERT-Base, Multilingual:
    102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
  • BERT-Base, Chinese:
    Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M
    parameters

其中包含:

  • A TensorFlow checkpoint (bert_model.ckpt) containing the pre-trained
    weights (which is actually 3 files).
  • A vocab file (vocab.txt) to map WordPiece to word id.
  • A config file (bert_config.json) which specifies the hyperparameters of
    the model.

其他语言见: Multilingual README。开放了中文数据集。

(算力紧张情况下单独训练了一版中文,中文影响力可见一斑,我辈仍需努力啊)

更多细节见: https://github.com/google-research/bert

Reference

  1. GitHub(TensorFlow): https://github.com/google-research/bert
  2. PyTorch version of BERT :https://github.com/huggingface/pytorch-pretrained-BERT
  3. BERT-Base, Chinese: https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip
  4. 论文: https://arxiv.org/abs/1810.04805.

猜你喜欢

转载自blog.csdn.net/shine19930820/article/details/83795586