概述

目前市场上有很多家的语音识别接口可用，简单测试都不要钱。国内的BAT和科大讯飞，国外的微软和谷歌都提供了中文的语音识别接口，既有sdk又有webAPI。我的测试都是在python3环境下进行的。

最终选择百度和科大讯飞的接口。主要是考虑中文识别应该国内厂商做的更好。

免费试用阶段，科大讯飞每天限定500次调用。百度则只限制每秒20次，总次数没限制。

试用下来的感觉就是，科大讯飞的接口更快，断句什么的也更好。但是试用次数少是个问题，只能评估玩玩。

科大讯飞语音识别webAPI

使用之前需要先去https://xfyun.cn注册并创建应用。获得APPID和APPkey.

官方的webAPI接口文档在：https://doc.xfyun.cn/rest_api/%E8%AF%AD%E9%9F%B3%E5%90%AC%E5%86%99.html

代码是官网上的，但是那是个python2上的，我改成python3版本了（python3对于参数的编码格式更严格）。

注意：将脚本中 AUDIO_PATH, API_KEY, APPID, 换成相应的音频路径，讯飞开放平台提供的 apiKey，讯飞开放平台应用的 appid 即可。

#!/usr/bin/python
# -*- coding: UTF-8 -*-
import urllib.request
import time
import urllib
import json
import hashlib
import base64
from urllib import parse

def main():
    f = open("AUDIO_PATH", 'rb')
    file_content = f.read()
    base64_audio = base64.b64encode(file_content)
    body = parse.urlencode({'audio': base64_audio})

    url = 'http://api.xfyun.cn/v1/service/v1/iat'
    api_key = 'API_KEY'
    param = {"engine_type":"sms16k","aue":"raw"}

    x_appid = 'APPID'
    json_str = json.dumps(param).replace(' ', '')
    print('json_str:{}'.format(json_str))
    x_param = base64.b64encode(bytes(json_str, 'ascii'))
    x_time = int(int(round(time.time() * 1000)) / 1000)
    x_checksum_str = api_key + str( x_time ) + str(x_param)[2:-1]
    print('x_checksum_str:[{}]'.format(x_checksum_str))
    x_checksum = hashlib.md5(x_checksum_str.encode(encoding='ascii')).hexdigest()
    print('x_checksum:{}'.format(x_checksum))
    x_header = {'X-Appid': x_appid,
                'X-CurTime': x_time,
                'X-Param': x_param,
                'X-CheckSum': x_checksum}

    start_time = time.time()
    req = urllib.request.Request(url, bytes(body, 'ascii'), x_header)
    result = urllib.request.urlopen(req)
    result = result.read()
    print( "used time: {}s".format( round( time.time() - start_time, 2 ) ) )
    print('result:'+str(result.decode(encoding='UTF8')))
    return

if __name__ == '__main__':
    main()

我的一个测试输出：

json_str:{"engine_type":"sms16k","aue":"raw"}

used time: 0.82s
result:{"code":"0","data":"特别是跨省区电网超计划用电，不仅损害自己，也损害别人，损害电网，损害国家。","desc":"success","sid":"zat006392f7@ch6b010ed8627f3d3700"}

一个注意点

如果没有把ip放到白名单中，就会返回以下错误信息，错误码是10105

al access|illegal client_ip:xxxxx

在‘控制台->我的应用’里把信息里的IP加上就行。

百度语音识别webAPI

需要先在百度云上注册：https://cloud.baidu.com/。

然后在管理控制台上创建一个语音识别应用，你会得到AppId\AppKey\SecretKey等信息，这些需要在调用接口的时候用到。

百度的python接口用起来比较简单，先pip install baidu-aip即可。
具体文档参考：https://cloud.baidu.com/doc/SPEECH/ASR-Online-Python-SDK.html#.E9.85.8D.E7.BD.AEAipSpeech

参考官方例子，我的代码如下（注意替换自己的APP_ID、API_KEY、SECRET_KEY）：

from aip import AipSpeech
import time

APP_ID = 'APP_ID'
API_KEY = 'API_KEY'
SECRET_KEY = 'SECRET_KEY'

client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

def get_file_content(filePath):
    with open(filePath, 'rb') as fp:
        return fp.read()

# 识别本地文件
start_time = time.time()
ret = client.asr(get_file_content('./aideo_files/A2_58.wav'), 'pcm', 16000, {
    'dev_pid': 1537,
})
used_time = time.time() - start_time

print( "used time: {}s".format( round( time.time() - start_time, 2 ) ) )
print('ret:{}'.format(ret))

我测试的输出结果为：

used time: 8.18s
ret:{'corpus_no': '6592465378279780417', 'err_msg': 'success.', 'err_no': 0, 'result':
['特别是跨省区电网超计划用电，不仅损害自己也损害别人损害电网损害国家，'], 'sn': '148955205071534927957'}

时间上比科大讯飞长了好几倍，断句也不是很好。胜在试用次数基本不限。

小结

比较	优点	缺点
百度	试用次数多	速度慢，断句不太好
科大讯飞	速度快，断句好	试用次数有限

今天测试比较简单，等后面大规模测试了再来补充。

语音识别接口webAPI-python测试：百度和科大讯飞

概述

科大讯飞语音识别webAPI

一个注意点

百度语音识别webAPI

小结

猜你喜欢