In natural language processing, perplexity is a method used to measure the pros and cons of the language model. Its value is the result of the exponentiation do cross entropy loss function obtained.
Cross-entropy loss function
Single training sample loss:
is the number of labels in the language model of the middle finger is the total number of characters. Is the predicted probability, It is the probability of correct prediction. If the label number 3, each sample is only one label, the prediction result is correct category 1, the , , into the formula to give . We can see the cross-entropy loss function is only concerned with the probability of correct prediction.
Perplexity
- At best, the model is always the probability label category is forecast to be 1, then confusion is 1;
- The worst case, the model is always the probability label category is forecast to 0, then perplexity is positive infinity;
- Under baseline model to predict the probability always the same for all categories.
,
. At this perplexity is the number of categories.
Obviously, any confusion of a valid model must be less than the number of categories. In the language model, confusion must be less than dictionary sizevocab_size
.