table of Contents:
1. What is the WER
2. WER calculation principles
3. WER test design
4. The current industry recognition level
1. What is the WER
Speech recognition ( Automatic Speech Recognition, ASR ), the commonly used evaluation criteria for the word error rate WER ( Word E rror R ATE ) , when the test for the Chinese language is also used CER ( Character Error Rate ) character error rate , two by calculation principle is the same, the basic unit is the English word Word, is the basic unit of Chinese Character . This article unified use WER.
WER calculated as: To make identified between the sequence of words and word sequences consistent standards need to be replaced, deleted or inserted certain words . The total number of insertion, substitution, deletion of words, the percentage by dividing the number of words in the standard word sequences, i.e. WER, which is calculated as follows:
Word accuracy Word the Accuracy , abbreviated as W.Acc , there W.Acc formula: W.Acc = . 1 - WER
Because of the relationship between computing, so we only need to measure the test WER can be.
2. WER calculation principles
The first line standard sequence of words ( the REF R & lt eference reference), a second sequence of words identified behavior ( HYP to hyphen hyphen ), the third classification of the behavior character ( the Eval the evaluate evaluation).
Then the above WER is calculated as follows:
Increase word 3, replace the word six, delete a word, then WER is:
WER = ( 6 + 3 + 1) / 13 = 76.9%
Similarly, Chinese examples are as follows:
REF |
now |
day |
* |
day |
gas |
How |
What |
kind |
HYP |
shock |
day |
field |
day |
gas |
* |
* |
* |
Eval |
S |
|
I |
|
|
D |
D |
D |
WER = ( 1 + 1 + 3 ) / 7 = 71.4%
3. WER test design
The test program is divided into:
a) ready to set a standard test txt text (or other format) as "reference set of strings", txt to text, for example, recorded as follows
1 today, how is the weather
2 What is your name
3 .......
b) ready to test D E Mo procedure, the ASR generated statement into a record file will be testing Demo Test as " recognition result set " , and the format in step A ) more consistent file, facilitating string
c) The "reference set of strings" and " the recognition result set " in accordance with the above string "2. WER calculation principle for calculating,"
Output WER test results
4. The current industry recognition level
English -WER :
IBM: Switchboard industry standard speech recognition task, in 2016 6.9% 2017 5.5%
微软:行业标准Switchboard语音识别任务,2016年 6.3% -> 5.9%,2017年 5.1%,这个目前最低的。
说明:ICASSP2017上IBM说人类速记员WER是5.1%,一般认为5.9% 的WER字错率是人类速记员的水平。
中文-WER/CER:
小米:2018年 小米电视 2.81%
百度:2016年 短语识别 3.7%
中文-W.Corr(W.Corr = W.Acc = 1-WER):
百度:2016年 识别准确率 97%
搜狗:2016年 识别准确率 97%
讯飞:2016年 识别准确率 97%
数据来源:
微软WER 5.9%:https://arxiv.org/abs/1610.05256
微软WER 5.1%:
https://www.microsoft.com/en-us/research/wp-content/uploads/2017/08/ms_swbd17-2.pdf
小米电视CER 2.81% :https://arxiv.org/pdf/1707.07167.pdf
国内百度等同时宣布识别准确率97% : https://www.zhihu.com/question/53001402