BLEU:一种自动化评价机器翻译性能的方法

 

Modified unigram precision

判断一个翻译得好不好,是看翻译的话里面和reference中的句子一样的单词有多少一致,一致得越多,说明翻译得越准。也就是看准确率(即找出的单词总数中找对的单词所占的比例)。但这样会存在这样一种问题。

Candidate: the the the the the the the

Reference 1: The cat is on the mat.

Reference 2: There is a cat on the mat.

上面翻译的话,一看就是鸟话,但是每个单词都在reference中出现过,所以准确率是7/7=100%,但这明显不合理,因此推出modified unigram precision。

其思想是:

Reference中的the最多出现了2次,因此,即使candidate中全是the,但是只能算前两个配对了,后面的the就认为不算。因此modified unigram precision=2/7

测试实例1:

Candidate 1: It is a guide to action which ensures that the military always obeys the commands of the party.

Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct.

Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.

Reference 2:It is the guiding principle which guarantees the military forces always being under the command of the Party.

Reference 3:It is the practical guide for the army always to heed the directions of the party.

Candidate 1中:the出现最多的reference是reference 2中的4次,所以candidate 1中的the算次数最多不能超过4次,而candidate 1中的the出现了3次,所以the的总数算3次。进行累加得到:It(1)+is(1)+a(1)+guide(1)+to(1)+action(1)+which(1)+ensures(1)+that(1)+the(1)+military(1)+always(1)+the(1)+commands(1)+of(1)+the(1)+party(1)=17

然后modified unigram precision=17/candidate1中单词总数18=17/18

而Candidate 2中:

It(1)+is(1)+to(1)+the(1)+forever(1)+the(1)+that(1)+party(1)=8

然后modified unigram precision=8/candidate2中单词总数=8/14

而bi-gram等情况是这样匹配

W1w2w3:w1w2算一个,w2w3算一个

上面是针对单个句子,如果想针对篇章等句子的组合计算方法为:

以此时实例1为例:

Modified unigram precision=17+8/(18+14)

使用modified precison有个最大的缺陷在于,如果candidate的句子长度很短,即使是一句鸟话,得到的modified precision依旧很高。如下例所示:

Candidate: of the

Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.

Reference 2:It is the guiding principle which guarantees the military forces always being under the command of the Party.

Reference 3:It is the practical guide for the army always to heed the directions of the party.

其中,modified unigram precision=2/2,而modified bi-gram precison=1/1

为了解决candidate的句子很短造成的这种问题,可以考虑同时引入recall来进行折中。

但是recall对于下面的例子又显得不恰当。

Candidate 1: I always invariably perpetually do.

Candidate 2: I always do.

Reference 1: I always do.

Reference 2: I invariably do.

Reference 3: I perpetually do.

Candidate 1因为单词包含得比candidate2多,所以recall较大,但显然candidate 1不如candidate 2.

BLEU的详细思想:

针对某一句candidate,有很多个reference,选取其中长度最接近的reference,在语料中的这样的长度的求和为r,而candidate的长度的求和为c

BLEU只考虑了precision的情况,为了解决candidate句子短造成的问题,所以引入了惩罚措施,即BLEU.具体推导过程如下:

引入对数的原因在于:使得数据之间不会因为稀疏性造成的差别很大的情况,且单调性不会发生变化。

引入几何平均的意义在于:可以体现出不同性质的参数的折中综合性能,在这里就是每句话的翻译集成在一起时整个篇章的翻译好坏。

猜你喜欢

转载自www.cnblogs.com/sxytalent/p/10889408.html