CMRC2018
Train number:10142, Dev number:3219, Test number:1002
BERT
parameter
Floor | The amount of parameters | Accounting |
---|---|---|
MultiHeadSelfAttention | 2362368*12 | 27.55% |
TokenEmbedding | 16226304 | 15.77% |
FeedForward | 4722432*12 | 55.08% |
PositionEmbedding | 393216 | 0.38% |
Masked Language Model
Two-sentence Tasks
Task specific-Models
BERT applies to reading comprehension
In SQuAD, given the problems and the paragraph that contains the answer, the task is to predict the answer to the text of the start and end position (answer span, span). BERT first added in front of the problem special classification token [CLS] tag, and then issues the paragraphs together, using an intermediate special tokens [SEP] apart. Sequences segment embeddings and positional embeddings input to BERT. Finally, the whole connection layer and softmax function to convert the final hidden BERT for the answer probability span. Due to fine-tune the BERT can capture the relationship between a problem and a paragraph, so performing well on SQuAD.
BERT input and output
- Input:
input_ids, input_mask, segment_ids
- Output:
start_position, end_position
BERT input analysis
Calculating start and end positions
example = SquadExample( qas_id=qas_id,
question_text=question_text,
doc_tokens=doc_tokens,
orig_answer_text=orig_answer_text,
start_position=start_position,
end_position=end_position)
复制代码
Relabeling start and end positions, this is because of the word. Even bert Chinese version is in accordance with the word, but some English words, the year will still be separated individually, such as 1990 and so on, but marked time for unity, in accordance with the number of characters to count, that is, regardless of the word, so when processing to rewrite the start and end positions, for example, the following piece of text, the original starting position is 41, but after the word changed to 38, note that this has no effect on the results generated.
Since the input BERT is this mosaic:
[CLS]question[SEP]context[SEP]
复制代码
So finally enter the starting position plus the length of the model in question, and ([CLS], [SEP]), or in two of the above example.
- The first example:
"question": "范廷颂是什么时候被任为主教的?"
a length of 15, together with [the CLS] and [the SEP], is 17, plus the original starting position 30, 47 finally obtained. - A second example:
"question": "1990年,范廷颂担任什么职务?"
a length of 13, together with [the CLS] and [the SEP], is 15, plus the original starting position 38, 53 finally obtained.
Debug and consistent results
max_query_length: 问题的最大长度,超过会截断
复制代码
Reproduced in: https: //juejin.im/post/5d05cd6c518825509075f7f9