Perplexity perplexity

In natural language processing, perplexity is a method used to measure the pros and cons of the language model. Its value is the result of the exponentiation do cross entropy loss function obtained.

Cross-entropy loss function

Single training sample loss:

l O s s = 1 n i = 1 n y i l o g y i ^ = l o g y j ^ loss=-\frac{1}{n}\sum_{i=1}^{n}y_{i}log\hat{y_{i}}=-log\hat{y_{j}}

n n is the number of labels in the language model of the middle finger is the total number of characters. y i ^ \hat{y_{i}} Is the predicted probability, y i y_{i} It is the probability of correct prediction. If the label number 3, each sample is only one label, the prediction result is correct category 1, the y 1 = 1 y_ {1} = 1 y 2 = 0 y_ {2} = 0 y 3 = 0 y_ {3} = 0 into the formula to give l o s s = l o g y 1 ^ loss=-log\hat{y_{1}} . We can see the cross-entropy loss function is only concerned with the probability of correct prediction.

Perplexity

p e r p l e x i t y = e l o s s = 1 y i ^ perplexity=e^{loss}=\frac{1}{\hat{y_{i}}}

  • At best, the model is always the probability label category is forecast to be 1, then confusion is 1;
  • The worst case, the model is always the probability label category is forecast to 0, then perplexity is positive infinity;
  • Under baseline model to predict the probability always the same for all categories. y i ^ = 1 n \hat{y_{i}}=\frac{1}{n} p e r p l e x i t y = n perplexity=n . At this perplexity is the number of categories.
    Obviously, any confusion of a valid model must be less than the number of categories. In the language model, confusion must be less than dictionary sizevocab_size.
Published 13 original articles · won praise 0 · Views 309

Guess you like

Origin blog.csdn.net/qsmx666/article/details/104514206
Recommended