Based on the transformer's BERT model to do cloze, it is directly invincible

BERT is a Transformer-based model pre-trained on a large English dataset in a self-supervised manner. This means that the BERT model is only pre-trained on raw text data without human intervention. Its BERT model generates input data and labels from these datasets through an automated process. To be precise, the BERT model is pre-trained for two purposes. :

Masked Language Modeling (MLM): Take a sentence, the model randomly masks 15% of the words in the input, then runs the training through the model and predicts the masked words. This is different from traditional recurrent neural networks (RNNs), which typically feed in words one after the other, and from autoregressive models like GPT.

Sentence Prediction (NSP): The model concatenates masked two sentences as input during pre-training. Sometimes the two sentences in the data are adjacent sentences, and sometimes they are not. The model needs to be trained to judge the relevance of the two sentences.

The BERT model was initially released in 2 versions for case-sensitive and case-insensitive input text. With the model's

Guess you like

Origin blog.csdn.net/weixin_44782294/article/details/131749217