BERT reading comprehension

CMRC2018

Train number:10142, Dev number:3219, Test number:1002

BERT

Chinese paper translation

parameter

Floor The amount of parameters Accounting
MultiHeadSelfAttention 2362368*12 27.55%
TokenEmbedding 16226304 15.77%
FeedForward 4722432*12 55.08%
PositionEmbedding 393216 0.38%

Masked Language Model

Two-sentence Tasks

Task specific-Models

BERT applies to reading comprehension

In SQuAD, given the problems and the paragraph that contains the answer, the task is to predict the answer to the text of the start and end position (answer span, span). BERT first added in front of the problem special classification token [CLS] tag, and then issues the paragraphs together, using an intermediate special tokens [SEP] apart. Sequences segment embeddings and positional embeddings input to BERT. Finally, the whole connection layer and softmax function to convert the final hidden BERT for the answer probability span. Due to fine-tune the BERT can capture the relationship between a problem and a paragraph, so performing well on SQuAD.

BERT input and output

  • Input:input_ids, input_mask, segment_ids
  • Output:start_position, end_position

BERT input analysis

Calculating start and end positions

example = SquadExample( qas_id=qas_id,
                        question_text=question_text,
                        doc_tokens=doc_tokens,
                        orig_answer_text=orig_answer_text,
                        start_position=start_position,
                        end_position=end_position)
复制代码

Relabeling start and end positions, this is because of the word. Even bert Chinese version is in accordance with the word, but some English words, the year will still be separated individually, such as 1990 and so on, but marked time for unity, in accordance with the number of characters to count, that is, regardless of the word, so when processing to rewrite the start and end positions, for example, the following piece of text, the original starting position is 41, but after the word changed to 38, note that this has no effect on the results generated.

Since the input BERT is this mosaic:

[CLS]question[SEP]context[SEP]
复制代码

So finally enter the starting position plus the length of the model in question, and ([CLS], [SEP]), or in two of the above example.

  • The first example: "question": "范廷颂是什么时候被任为主教的?"a length of 15, together with [the CLS] and [the SEP], is 17, plus the original starting position 30, 47 finally obtained.
  • A second example: "question": "1990年,范廷颂担任什么职务?"a length of 13, together with [the CLS] and [the SEP], is 15, plus the original starting position 38, 53 finally obtained.

Debug and consistent results

max_query_length: 问题的最大长度,超过会截断
复制代码

Reproduced in: https: //juejin.im/post/5d05cd6c518825509075f7f9

Guess you like

Origin blog.csdn.net/weixin_33943347/article/details/93181497