Phone micro-channel voice-to-text batches using Baidu speech recognition

Baidu intelligent cloud with bulk micro-channel voice recognition


If you do not see in cnblog authored in carr0t2 this article, it is recommended to get better access to the original page layout, images experience
I am now not sure whether to continue to develop graphical interface, more support, if indeed there is a need, please leave a comment below.

Preparation tools and environment

  1. Python3.7
  2. silk-v3-decoder https://github.com/kn007/silk-v3-decoder
  3. Baidu intelligent cloud account (the account with Baidu on the line), applications API Key and Secret Key
  4. Baidu short speech recognition API Demo Demo code is modified based on the official https://github.com/Baidu-AIP/speech-demo/tree/master/rest-api-asr/python
  5. This paper environment for Windows python3.7

General idea

  1. Phone micro letter locate the saved voice file location, export

  2. By silk-v3-decoder for converting the recording format wav

  3. wav ffmpeg is converted into PCM, a sampling frequency 16000

  4. Identifying in python

  5. Individual treatment only welcome that there is a problem

Specific operation

Export micro-channel voice files

  • Phone micro-channel voice files are generally stored in内部存储\tencent\MicroMsg\****************************\voice2

    The asterisk is in a long string containing alphanumeric

    Which includes the many such folders

  • All copy and paste into and extract audio files

    Under Windows Search.amr

    Select copy and paste into a new folder

  • These are the audio file, but the format is rather strange, need to be addressed to the customary format

Export processing voice files

Rename the file

  • Because to maintain relative order, and direct conversion will result in file modification time, so the order can not be restored to normal voice
  • With python, and rename the extracted file modification time
import os
import time

path='.\\lecture'
dirs = os.listdir(path)
for file in dirs:
    finfo = os.stat(path+'\\'+file)
    timeArray = time.localtime(finfo.st_mtime)
    nametime = time.strftime("%Y_%m_%d_%H_%M_%S", timeArray)
    os.rename(path+'\\'+file,path+'\\'+nametime+'.amr')
    print(nametime)

Convert pcm format

  • python command line call silk_v3_decoder.exedecoding, specific commands written below
  • pcmFiles can not seem to directly play, Audacity is possible

Demo modify the code

Before revision

  • silk_v3_decoder.exeTurn the format 16k pcm
    FORMAT = 'pcm'
    pathamr=r'.\amr'
    pathpcm=r'.\pcm'
    dirs = os.listdir(pathamr)
    #dirs.remove('desktop.ini')### Windows可能会有这个文件
    for file in dirs:
        time.sleep(0.3)
        name=file[:-3]
        commandstring= ' silk_v3_decoder.exe ' + str(pathamr) + '\\' + name + 'amr ' + str(pathpcm) +'\\'+ str(name) + 'pcm' +' -Fs_API 16000 '
        os.system(commandstring)
        AUDIO_FILE =str(pathpcm)+'\\'+ str(name) + 'pcm'

Later changes

  • The output is appended, and increase the time field, follow-up treatment have not done, so the exported file or json
	    with open("result.txt","a") as of:
            result_dict=eval(result_str)
            result_dict["time"]=name
            of.write(str(result_dict)+'\n')

postscript

  • Just learning python, casually write, please point out mistakes
  • File follow-up treatment is not ready, the output is trying to do a line in front of the time, behind a line identifying the content, if there is a greater recognition deviation, find the location convenient to re-listen
  • Did not achieve full automation, or to manually process the content.
  • Useless to Baidu's voice from training platform

Code (for reference)

import sys
import json
import base64
import time
import os
import subprocess

IS_PY3 = sys.version_info.major == 3

if IS_PY3:
    from urllib.request import urlopen
    from urllib.request import Request
    from urllib.error import URLError
    from urllib.parse import urlencode
    timer = time.perf_counter
else:
    from urllib2 import urlopen
    from urllib2 import Request
    from urllib2 import URLError
    from urllib import urlencode
    if sys.platform == "win32":
        timer = time.clock
    else:
        # On most other platforms the best timer is time.time()
        timer = time.time

API_KEY = '****************'### 填入自己的
SECRET_KEY = '*****************'

# 需要识别的文件
# 文件格式
FORMAT = 'pcm'  # 文件后缀只支持 pcm/wav/amr 格式,极速版额外支持m4a 格式
###这里为了方便直接限制死
CUID = '****************'
# 采样率
RATE = 16000  # 固定值

DEV_PID = 1537  # 1537 表示识别普通话,使用输入法模型。根据文档填写PID,选择语言及识别模型
ASR_URL = 'http://vop.baidu.com/server_api'
SCOPE = 'audio_voice_assistant_get'  # 有此scope表示有asr能力,没有请在网页里勾选,非常旧的应用可能没有


class DemoError(Exception):
    pass


"""  TOKEN start """

TOKEN_URL = 'http://openapi.baidu.com/oauth/2.0/token'

def fetch_token():
    params = {'grant_type': 'client_credentials',
              'client_id': API_KEY,
              'client_secret': SECRET_KEY}
    post_data = urlencode(params)
    if (IS_PY3):
        post_data = post_data.encode( 'utf-8')
    req = Request(TOKEN_URL, post_data)
    try:
        f = urlopen(req)
        result_str = f.read()
    except URLError as err:
        print('token http response http code : ' + str(err.code))
        result_str = err.read()
    if (IS_PY3):
        result_str =  result_str.decode()

    print(result_str)
    result = json.loads(result_str)
    print(result)
    if ('access_token' in result.keys() and 'scope' in result.keys()):
        print(SCOPE)
        if SCOPE and (not SCOPE in result['scope'].split(' ')):  # SCOPE = False 忽略检查
            raise DemoError('scope is not correct')
        print('SUCCESS WITH TOKEN: %s  EXPIRES IN SECONDS: %s' % (result['access_token'], result['expires_in']))
        return result['access_token']
    else:
        raise DemoError('MAYBE API_KEY or SECRET_KEY not correct: access_token or scope not found in token response')

"""  TOKEN end """

if __name__ == '__main__':
    token = fetch_token()

    pathamr=r'.\amr'
    pathpcm=r'.\pcm'
    dirs = os.listdir(pathamr)
    #dirs.remove('desktop.ini')### Windows可能会有这个文件
    for file in dirs:
        time.sleep(0.2)
        name=file[:-3]
        commandstring= ' silk_v3_decoder.exe ' + str(pathamr) + '\\' + name + 'amr ' + str(pathpcm) +'\\'+ str(name) + 'pcm' +' -Fs_API 16000 '
        os.system(commandstring)
        ######下面没怎么动过了
        AUDIO_FILE =str(pathpcm)+'\\'+ str(name) + 'pcm'
        speech_data = []
        with open(AUDIO_FILE, 'rb') as speech_file:
            speech_data = speech_file.read()

        length = len(speech_data)
        if length == 0:
            raise DemoError('file %s length read 0 bytes' % AUDIO_FILE)
        speech = base64.b64encode(speech_data)
        if (IS_PY3):
            speech = str(speech, 'utf-8')
        params = {'dev_pid': DEV_PID,
                 #"lm_id" : LM_ID,    #测试自训练平台开启此项
                  'format': FORMAT,
                  'rate': RATE,
                  'token': token,
                  'cuid': CUID,
                  'channel': 1,
                  'speech': speech,
                  'len': length
                  }
        post_data = json.dumps(params, sort_keys=False)
        # print post_data
        req = Request(ASR_URL, post_data.encode('utf-8'))
        req.add_header('Content-Type', 'application/json')
        try:
            begin = timer()
            f = urlopen(req)
            result_str = f.read()
            print ("Request time cost %f" % (timer() - begin))
        except URLError as err:
            print('asr http response http code : ' + str(err.code))
            result_str = err.read()

        if (IS_PY3):
            result_str = str(result_str, 'utf-8')
        print(result_str)
        with open("result.txt","a") as of:
            result_dict=eval(result_str)
            #result_dict["time"]=name
            #of.write(str(result_dict)+'\n')
            of.write('{'+name+'}'+'\n')
            try:
                of.write(str(result_dict["result"])[2:-2]+'\n\n')
            except:
                of.write('Error'+'\n')

Guess you like

Origin www.cnblogs.com/carr0t/p/baiduasr.html