Python learning-114- automatic text summarization Rouge evaluation system implementation (very simple)

Foreword:

Recently, they have been studying the automatic abstract generation of short texts. In terms of experimental testing, domestic and foreign researchers generally use Rouge evaluation systems such as Rouge-1, Rouge-2, and Rouge-L. Today we will talk about his python implementation.

You go to Baidu to search for information and see various configurations and installations as follows: particularly troublesome 

Actually don't have to be so troublesome  

If you simply use the three evaluation systems of Rouge-1, Rouge-2, and Rouge-L, the implementation is very simple.

Step 1: Use pip to install rouge

pip install rouge

Step 2: Calculate the Rouge value of the generated text and the reference text

# coding:utf8
from rouge import Rouge
a = ["i am a student from china"]  # 预测摘要 (可以是列表也可以是句子)
b = ["i am student from school on japan"] #真实摘要

'''
f:F1值  p:查准率  R:召回率
'''
rouge = Rouge()
rouge_score = rouge.get_scores(a, b)
print(rouge_score[0]["rouge-1"])
print(rouge_score[0]["rouge-2"])
print(rouge_score[0]["rouge-l"])

The above is to calculate the Rouge value of a pair of data as follows

{'p': 1.0, 'f': 0.7272727226446282, 'r': 0.5714285714285714}
{'p': 1.0, 'f': 0.6666666622222223, 'r': 0.5}
{'p': 1.0, 'f': 0.6388206388206618, 'r': 0.5714285714285714}
f:F1值  p:查准率  R:召回率 

This is different from the value we saw in the paper. There is one value in the paper. Why are there three here. How should the value be calculated like the value in the paper?

Step 3: Rouge worth selection and selection principle

-------------------------------------------------- -----------------------------principle-------------------- -------------------------------------------------- -----

Rouge (Recall-Oriented Understudy for Gisting Evaluation) is a set of indicators for evaluating automatic abstracts and machine translation. It is calculated by comparing the automatically generated abstract or translation with a set of reference abstracts (usually generated manually) to obtain the corresponding score to measure the "similarity" between the automatically generated abstract or translation and the reference abstract .
The definition of Rouge-1, Rouge-2, and Rouge-N is like this

The denominator is the number of n-grams, and the numerator is the number of n-grams shared by the reference summary and the automatic summary.

for example:

the cat was found under the bed  #生成的摘要
the cat was under the bed   #参考摘要

Calculation of both n-grams 

 So for Rouge1 and Rouge2 it should be the recall rate under n-gram Recall

Rouge-L
L is the first letter of LCS (longest common subsequence), because Rouge-L uses the longest common subsequence. Rouge-L calculation method is as follows:

Reference for this blog:

https://www.jianshu.com/p/2d7c3a1fcbe3

https://blog.csdn.net/qq_25222361/article/details/78694617

-------------------------------------------------- -----------------------principle-------------------------- -------------------------------------------------- ---------

 

Guess you like

Origin blog.csdn.net/u013521274/article/details/89460322