Based on python voice control DJI's innovative EP robot and dialogue-the second step is to recognize and play the recording through Baidu AI

Based on python voice control DJI's innovative EP robot and dialogue-the second step is to recognize and play the recording through Baidu AI

Tell me about the mistakes of the previous article

The last article has a little problem with programming thinking, because I added a dialogue in the back and forgot to change the programming thinking. Below I will
record what I think of programming thinking. Computer recording-recognized as text by Baidu AI ——Send text information to EP ——Send text information to Turing robot ——Turn the result returned by Turing robot into MP3 ——Play MP3

Tell me about the code in the previous article

Process: Sampling-"Write File

import pyaudio   #导入pyAudio的源代码文件,我们下面要用到,不用到就不用导入啦
import wave      
def record():    #定义函数
    CHUNK = 1024        
    FORMAT = pyaudio.paInt16        #量化位数
    CHANNELS = 1                     #采样管道数
    RATE = 16000                     #采样率  
    RECORD_SECONDS = 2          
    WAVE_OUTPUT_FILENAME = "output.wav" #文件保存的名称
    p = pyaudio.PyAudio()              #创建PyAudio的实例对象
    stream = p.open(format=FORMAT,      #调用PyAudio实例对象的open方法创建流Stream
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK)
    frames = []                 #存储所有读取到的数据
    print('* recording >>>')     #打印开始录音
    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)): #for循环,控制录音时间
        data = stream.read(CHUNK)   #根据需求,调用Stream的write或者read方法
        frames.append(data)
    print('* stop >>>')    #打印结束录音
    stream.close()   #调用Stream的close方法,关闭流
    p.terminate()   #调用pyaudio.PyAudio.terminate() 关闭会话
    wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')   #写入wav文件里面
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()
record() #运行异常改函数

This is a general explanation of the code. We do n’t have to remember the details of it. We just need to remember what it does, just call it directly.

How to recognize the recorded sound through Baidu AI

Baidu Voice is a service provided by Baidu Cloud AI open platform that supports speech recognition and speech synthesis. After registration, you can directly access its REST API, and there are free call quotas for ordinary users, which seems to be 5000 times.

After successful registration, enter the console of the voice service to create a new application, write down your AppID, API Key and Secret Key, these need to be written into the code.
Sign up for Baidu AI and create an application
to search for Baidu AI Fanfan platform on the browser and enter the official website.
Registration
Insert picture description here
After logging on to find the AI voice technology
Insert picture description here
to create application
Insert picture description here
Finish to create and write down their AppID, API Key and Secret Key
Insert picture description here
so that we finish creating the application, then the next should be how to program it, this time we can see the technical documentation Baidu AI platform
Insert picture description here

from aip import AipSpeech

""" 你的 APPID AK SK """
APP_ID = '你的 App ID'
API_KEY = '你的 Api Key'
SECRET_KEY = '你的 Secret Key'

client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

The following is the interface description: For example, to identify a voice file with a voice stored in the segment:
Insert picture description here

# 读取文件
def get_file_content(filePath):
    with open(filePath, 'rb') as fp:
        return fp.read()

# 识别本地文件
client.asr(get_file_content('audio.pcm'), 'pcm', 16000, {
    'dev_pid': 1537,
    result_text = result["result"][0]

    print("you said: " + result_text)

    return result_text  
})

After recognition, we also read the recognized text into speech. At this time, we also used to see Baidu Voice (TTS). Many softwares have a text-to-speech function, but Baidu Voice No. 4 is a nice lady voice. .

from aip import AipSpeech

APP_ID = 'Your AppID'
API_KEY = 'Your API Key'
SECRET_KEY = 'Your Secret Key'

client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

def speak(text=""):
    result = client.synthesis(text, 'zh', 1, {
        'spd': 4,
        'vol': 5,
        'per': 4,
    })

    if not isinstance(result, dict):
        with open('audio.mp3', 'wb') as f:
            f.write(result)
speak("很高兴见到你")

Now that the text has been converted to speech, the following needs to play MP3. I used the playsound library, so we need to install the playsound library, pip install playsound, the playback code is as follows:

def speak1():
    playsound("audio.mp3")
speak1()

At this time, the code that we will record and recognize and play through Baidu AI will be over, the complete code

import pyaudio   #导入pyAudio的源代码文件,我们下面要用到,不用到就不用导入啦
import wave
from aip import AipSpeech
from playsound import playsound




APP_ID = '自己的APP_ID'        #新建AiPSpeech
API_KEY = '自己的API_KEY'
SECRET_KEY = '自己的SECRET_KEY'



client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)
   
def record():    #定义函数
    CHUNK = 1024        
    FORMAT = pyaudio.paInt16        #量化位数
    CHANNELS = 1                     #采样管道数
    RATE = 16000                     #采样率  
    RECORD_SECONDS = 2          
    WAVE_OUTPUT_FILENAME = "output.wav" #文件保存的名称
    p = pyaudio.PyAudio()              #创建PyAudio的实例对象
    stream = p.open(format=FORMAT,      #调用PyAudio实例对象的open方法创建流Stream
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK)
    frames = []                 #存储所有读取到的数据
    print('* 开始录音 >>>')     #打印开始录音
    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)   #根据需求,调用Stream的write或者read方法
        frames.append(data)
    print('* 结束录音 >>>')    #打印结束录音
    stream.close()   #调用Stream的close方法,关闭流
    p.terminate()   #调用pyaudio.PyAudio.terminate() 关闭会话
    wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')   #写入wav文件里面
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()

def cognitive():                           #读取文件
    def get_file_content(filePath):
        with open(filePath, 'rb') as fp:
            return fp.read()

    result = client.asr(get_file_content('output.wav'), 'wav', 16000, {
        'dev_pid': 1537,                   #识别本地文件
    })
    result_text = result["result"][0]

    print("you said: " + result_text)

    return result_text 


def speak(text=""):
    result = client.synthesis(text, 'zh', 1, {
        'spd': 4,
        'vol': 5,
        'per': 4,
    })

    if not isinstance(result, dict):
        with open('audio.mp3', 'wb') as f:
            f.write(result)        


    
def speak1():
    playsound("audio.mp3")



   
record()    #录音模块

result = cognitive()    #百度识别结果

speak(result)       #将百度识别结果转化成语音

speak1()           #朗读百度识别结果

All right. This is the end of the screenshot, you can try it.
References
playsound library
Baidu AI speech recognition technical documentation

Published 2 original articles · won 3 · views 263

Guess you like

Origin blog.csdn.net/qq_43603247/article/details/105561515