[nlp] GPT

1. Joint training tasks

1.1 NTP(Next Token Prediction)

There are two objective functions for gpt pre-training. The first one is the basic next word prediction task, select a K window, and use the embedding of K words in the window as a condition to predict the next word.

1.2 TC(Text Classification)

The second is a classification task, a paragraph is given a label, and then the label is predicted.

The objective function when fine-tuning as pre-training is the weighted sum of these two functions.

When he takes on the downstream task, he puts the input into the decoder of the transformer . Like bert, he uses the pre-trained parameters, and then adds the features to a subsequent FFN, as shown in the following figure:

 

His number of layers is 12 layers plus 768 dimensions. Bert set his parameters like this just to do a comparative experiment with him.

2. GPT2

 GPT2 is open ai in response to bert, so

Guess you like

Origin blog.csdn.net/Trance95/article/details/131786612
GPT