NLP | Generating Task Metrics: BLEU, ROUGE

Article Directory

1、BLEU:

BLEU idea (the bigger the better): compare the degree of overlap between the n-grams in the candidate translation and the reference translation, the higher the degree of overlap, the higher the quality of the translation. Unigrams are used to measure the accuracy of word translation, and higher-order n-grams are used to measure the fluency of sentence translation.

2、ROUGE:

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) focuses on recall (focusing on how many n-grams in the reference translation appear in the output) rather than precision (whether the n-gram in the candidate translation appears in the reference translation).

  • rouge-n: based on ngram co-occurrence statistics
  • Rouge-l: F1 calculation based on the co-occurrence recall and precision of the longest common subsequence
  • rough-w: F1 calculation of the co-occurrence recall and precision of the longest common subsequence with weights
  • rouge-s: F1 calculation of co-occurrence recall and precision of discontinuous binary groups
  • 一般用:blue-4, red-l, red-1, red-2







Reference:
https://blog.csdn.net/u012744245/article/details/123589005

Guess you like

Origin blog.csdn.net/weixin_43646592/article/details/131795893