ASR performance test program - see detailed cloud disk

 

table of Contents:

1.  What is the WER

2.  WER calculation principles

3.  WER test design

4. The  current industry recognition level

 

1.  What is the WER

Speech recognition ( Automatic Speech Recognition, ASR ), the commonly used evaluation criteria for the word error rate WER ( Word E rror R ATE ) , when the test for the Chinese language is also used CER ( Character Error Rate ) character error rate , two by calculation principle is the same, the basic unit is the English word Word, is the basic unit of Chinese Character . This article unified use WER.

WER calculated as: To make identified between the sequence of words and word sequences consistent standards need to be replaced, deleted or inserted certain words . The total number of insertion, substitution, deletion of words, the percentage by dividing the number of words in the standard word sequences, i.e. WER, which is calculated as follows:

 

Word accuracy Word the Accuracy , abbreviated as  W.Acc , there W.Acc formula: W.Acc = . 1  - WER   

Because of the relationship between computing, so we only need to measure the test WER can be.

 

2.  WER calculation principles

The first line standard sequence of words ( the REF R & lt eference reference), a second sequence of words identified behavior ( HYP to hyphen hyphen ), the third classification of the behavior character ( the Eval  the evaluate evaluation). 

 

Then the above WER is calculated as follows:

Increase word 3, replace the word six, delete a word, then WER is:

WER = ( 6 + 3 + 1) / 13 = 76.9%         

Similarly, Chinese examples are as follows:

REF

now

day

*

day

gas

How

What

kind

HYP

shock

day

field

day

gas

*

*

*

Eval

S

 

I

 

 

D

D

D

WER = ( 1  + 1  + 3 ) / 7 = 71.4%

 

3.  WER test design

The test program is divided into:

a)  ready to set a standard test txt text (or other format) as "reference set of strings", txt to text, for example, recorded as follows

1 today, how is the weather 

2 What is your name 

3  .......

b)  ready to test D E Mo procedure, the ASR generated statement into a record file will be testing Demo Test as " recognition result set " , and the format in step A ) more consistent file, facilitating string

c)  The "reference set of strings" and " the recognition result set " in accordance with the above string "2. WER calculation principle for calculating,"

Output WER test results

 

4. The  current industry recognition level

English -WER :

IBM: Switchboard industry standard speech recognition task, in 2016 6.9% 2017 5.5%

微软:行业标准Switchboard语音识别任务,2016年 6.3% -> 5.9%,2017年 5.1%,这个目前最低的。

说明:ICASSP2017上IBM说人类速记员WER是5.1%,一般认为5.9% 的WER字错率是人类速记员的水平。

 

中文-WER/CER:

小米:2018年 小米电视 2.81%

百度:2016年 短语识别 3.7%

 

中文-W.CorrW.Corr = W.Acc = 1-WER)

百度:2016年 识别准确率 97%

搜狗:2016年 识别准确率 97%

讯飞:2016年 识别准确率 97%

 

数据来源:

 

微软WER 5.9%:https://arxiv.org/abs/1610.05256

 

微软WER 5.1%:

https://www.microsoft.com/en-us/research/wp-content/uploads/2017/08/ms_swbd17-2.pdf

 

小米电视CER 2.81% :https://arxiv.org/pdf/1707.07167.pdf

 

国内百度等同时宣布识别准确率97% : https://www.zhihu.com/question/53001402

 

 

Guess you like

Origin www.cnblogs.com/yinlili/p/11855189.html