Python calls Baidu API for speech recognition

1. About the author

Gao Zhixiang, male, School of Electronic Information, Xi'an Polytechnic University, 2021 graduate student
Research direction: Machine Vision and Artificial Intelligence
Email : [email protected]

Liu Shuaibo, male, School of Electronic Information, Xi'an Polytechnic University, 2021 graduate student, Zhang Hongwei Artificial Intelligence Research Group
Research direction: Machine Vision and Artificial Intelligence
Email: [email protected]

2. Mandarin recognition based on Baidu API

2.1 Speech Recognition

Speech recognition is to convert a piece of speech signal into corresponding text information. The system mainly includes four parts: feature extraction, acoustic model, language model, dictionary and decoding. In addition, in order to extract features more effectively, it is often necessary to The audio signal is filtered, framed and other audio data preprocessing work, and the audio signal to be analyzed is properly extracted from the original signal.
General process:
insert image description here

2.2 Baidu API calling method

Through the establishment of applications such as voice technology on the Baidu intelligent development platform, relative technical authority functions will be obtained.
insert image description here
After the creation is completed, Baidu will give you a list of applications. You can use the AppID, API Key and Secret Key here to make API calls.

3. experiment

3.1 Experiment preparation

In this experiment, we use Baidu API for identification, so we need to install the baidu-aip module
. First, open the command line and enter pip install baidu-aip in it.
insert image description here
As shown above, the installation is successful.
Because this project uses pyqt5 to write the interface, it is also necessary to install the pyqt5 module.
Open the command line and enter pip install pyqt5 in it to install it.
Next, you need to go to Baidu AI's official website to create an application and get AppID, APIKey, Secret Key.

3.2 Experimental results

insert image description here
Here, you can directly enter the corresponding number, start recording after the enter key, and then pop up the Baidu search interface, you can search directly, that is, the experiment is successful!

4. Experimental code

import wave
import requests
import time
import base64
from pyaudio import PyAudio, paInt16
import webbrowser

framerate = 16000  # 采样率
num_samples = 2000  # 采样点
channels = 1  # 声道
sampwidth = 2  # 采样宽度2bytes
FILEPATH = 'speech.wav'

base_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=%s&client_secret=%s"
APIKey = "********"  # 填写自己的APIKey
SecretKey = "**********"  # 填写自己的SecretKey

HOST = base_url % (APIKey, SecretKey)


def getToken(host):
    res = requests.post(host)
    return res.json()['access_token']


def save_wave_file(filepath, data):
    wf = wave.open(filepath, 'wb')
    wf.setnchannels(channels)
    wf.setsampwidth(sampwidth)
    wf.setframerate(framerate)
    wf.writeframes(b''.join(data))
    wf.close()


def my_record():
    pa = PyAudio()
    stream = pa.open(format=paInt16, channels=channels,
                     rate=framerate, input=True, frames_per_buffer=num_samples)
    my_buf = []
    # count = 0
    t = time.time()
    print('正在录音...')

    while time.time() < t + 4:  # 秒
        string_audio_data = stream.read(num_samples)
        my_buf.append(string_audio_data)
    print('录音结束.')
    save_wave_file(FILEPATH, my_buf)
    stream.close()


def get_audio(file):
    with open(file, 'rb') as f:
        data = f.read()
    return data


def speech2text(speech_data, token, dev_pid=1537):
    FORMAT = 'wav'
    RATE = '16000'
    CHANNEL = 1
    CUID = '*******'
    SPEECH = base64.b64encode(speech_data).decode('utf-8')

    data = {
    
    
        'format': FORMAT,
        'rate': RATE,
        'channel': CHANNEL,
        'cuid': CUID,
        'len': len(speech_data),
        'speech': SPEECH,
        'token': token,
        'dev_pid': dev_pid
    }
    url = 'https://vop.baidu.com/server_api'
    headers = {
    
    'Content-Type': 'application/json'}
    # r=requests.post(url,data=json.dumps(data),headers=headers)
    print('正在识别...')
    r = requests.post(url, json=data, headers=headers)
    Result = r.json()
    if 'result' in Result:
        return Result['result'][0]
    else:
        return Result


def openbrowser(text):
    maps = {
    
    
        '百度': ['百度', 'baidu'],
        '腾讯': ['腾讯', 'tengxun'],
        '网易': ['网易', 'wangyi']

    }
    if text in maps['百度']:
        webbrowser.open_new_tab('https://www.baidu.com')
    elif text in maps['腾讯']:
        webbrowser.open_new_tab('https://www.qq.com')
    elif text in maps['网易']:
        webbrowser.open_new_tab('https://www.163.com/')
    else:
        webbrowser.open_new_tab('https://www.baidu.com/s?wd=%s' % text)


if __name__ == '__main__':
    flag = 'y'
    while flag.lower() == 'y':
        print('请输入数字选择语言:')
        devpid = input('1536:普通话(简单英文),1537:普通话(有标点),1737:英语,1637:粤语,1837:四川话\n')
        my_record()
        TOKEN = getToken(HOST)
        speech = get_audio(FILEPATH)
        result = speech2text(speech, TOKEN, int(devpid))
        print(result)
        if type(result) == str:
            openbrowser(result.strip(','))
        flag = input('Continue?(y/n):')

Guess you like

Origin blog.csdn.net/m0_37758063/article/details/123645822