content
1. About the author
Gao Zhixiang, male, School of Electronic Information, Xi'an Polytechnic University, 2021 graduate student
Research direction: Machine Vision and Artificial Intelligence
Email : [email protected]
Liu Shuaibo, male, School of Electronic Information, Xi'an Polytechnic University, 2021 graduate student, Zhang Hongwei Artificial Intelligence Research Group
Research direction: Machine Vision and Artificial Intelligence
Email: [email protected]
2. Mandarin recognition based on Baidu API
2.1 Speech Recognition
Speech recognition is to convert a piece of speech signal into corresponding text information. The system mainly includes four parts: feature extraction, acoustic model, language model, dictionary and decoding. In addition, in order to extract features more effectively, it is often necessary to The audio signal is filtered, framed and other audio data preprocessing work, and the audio signal to be analyzed is properly extracted from the original signal.
General process:
2.2 Baidu API calling method
Through the establishment of applications such as voice technology on the Baidu intelligent development platform, relative technical authority functions will be obtained.
After the creation is completed, Baidu will give you a list of applications. You can use the AppID, API Key and Secret Key here to make API calls.
3. experiment
3.1 Experiment preparation
In this experiment, we use Baidu API for identification, so we need to install the baidu-aip module
. First, open the command line and enter pip install baidu-aip in it.
As shown above, the installation is successful.
Because this project uses pyqt5 to write the interface, it is also necessary to install the pyqt5 module.
Open the command line and enter pip install pyqt5 in it to install it.
Next, you need to go to Baidu AI's official website to create an application and get AppID, APIKey, Secret Key.
3.2 Experimental results
Here, you can directly enter the corresponding number, start recording after the enter key, and then pop up the Baidu search interface, you can search directly, that is, the experiment is successful!
4. Experimental code
import wave
import requests
import time
import base64
from pyaudio import PyAudio, paInt16
import webbrowser
framerate = 16000 # 采样率
num_samples = 2000 # 采样点
channels = 1 # 声道
sampwidth = 2 # 采样宽度2bytes
FILEPATH = 'speech.wav'
base_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=%s&client_secret=%s"
APIKey = "********" # 填写自己的APIKey
SecretKey = "**********" # 填写自己的SecretKey
HOST = base_url % (APIKey, SecretKey)
def getToken(host):
res = requests.post(host)
return res.json()['access_token']
def save_wave_file(filepath, data):
wf = wave.open(filepath, 'wb')
wf.setnchannels(channels)
wf.setsampwidth(sampwidth)
wf.setframerate(framerate)
wf.writeframes(b''.join(data))
wf.close()
def my_record():
pa = PyAudio()
stream = pa.open(format=paInt16, channels=channels,
rate=framerate, input=True, frames_per_buffer=num_samples)
my_buf = []
# count = 0
t = time.time()
print('正在录音...')
while time.time() < t + 4: # 秒
string_audio_data = stream.read(num_samples)
my_buf.append(string_audio_data)
print('录音结束.')
save_wave_file(FILEPATH, my_buf)
stream.close()
def get_audio(file):
with open(file, 'rb') as f:
data = f.read()
return data
def speech2text(speech_data, token, dev_pid=1537):
FORMAT = 'wav'
RATE = '16000'
CHANNEL = 1
CUID = '*******'
SPEECH = base64.b64encode(speech_data).decode('utf-8')
data = {
'format': FORMAT,
'rate': RATE,
'channel': CHANNEL,
'cuid': CUID,
'len': len(speech_data),
'speech': SPEECH,
'token': token,
'dev_pid': dev_pid
}
url = 'https://vop.baidu.com/server_api'
headers = {
'Content-Type': 'application/json'}
# r=requests.post(url,data=json.dumps(data),headers=headers)
print('正在识别...')
r = requests.post(url, json=data, headers=headers)
Result = r.json()
if 'result' in Result:
return Result['result'][0]
else:
return Result
def openbrowser(text):
maps = {
'百度': ['百度', 'baidu'],
'腾讯': ['腾讯', 'tengxun'],
'网易': ['网易', 'wangyi']
}
if text in maps['百度']:
webbrowser.open_new_tab('https://www.baidu.com')
elif text in maps['腾讯']:
webbrowser.open_new_tab('https://www.qq.com')
elif text in maps['网易']:
webbrowser.open_new_tab('https://www.163.com/')
else:
webbrowser.open_new_tab('https://www.baidu.com/s?wd=%s' % text)
if __name__ == '__main__':
flag = 'y'
while flag.lower() == 'y':
print('请输入数字选择语言:')
devpid = input('1536:普通话(简单英文),1537:普通话(有标点),1737:英语,1637:粤语,1837:四川话\n')
my_record()
TOKEN = getToken(HOST)
speech = get_audio(FILEPATH)
result = speech2text(speech, TOKEN, int(devpid))
print(result)
if type(result) == str:
openbrowser(result.strip(','))
flag = input('Continue?(y/n):')