46 lines of code implement free speech recognition, and people who have used it slap the table on the spot and shout "Okay"!

1. When doing some voice projects, you have to call API every time, either Baidu's or iFlytek's. Not only is it expensive, but the recognition effect is terrible, and it will be a long-term pain.
Insert image description here
2. Use a compiler of python3.8 and above.
Insert image description here
3. Install the pyaudio library package, which is used for real-time voice recording and saving. Use the following command:

pip install pyaudio -i https://pypi.tuna.tsinghua.edu.cn/simple

Insert image description here
4. Install the whisper library package, which is used for speech-to-text recognition. Use the following command:

pip install whisper -i https://pypi.tuna.tsinghua.edu.cn/simple

Insert image description here
5. Create a new .py file, such as the "speech recognition.py" file.
Insert image description here
6. Prepare to write code, first import the following four library packages.

import whisper
import zhconv
import wave  # 使用wave库可读、写wav类型的音频文件
import pyaudio  # 使用pyaudio库可以进行录音,播放,生成wav文件

Insert image description here
7. Define a recording function and define the data flow block in the recording function.

def record(time):  # 录音程序
    # 定义数据流块
    CHUNK = 1024  # 音频帧率(也就是每次读取的数据是多少,默认1024)
    FORMAT = pyaudio.paInt16  # 采样时生成wav文件正常格式
    CHANNELS = 1  # 音轨数(每条音轨定义了该条音轨的属性,如音轨的音色、音色库、通道数、输入/输出端口、音量等。可以多个音轨,不唯一)
    RATE = 16000  # 采样率(即每秒采样多少数据)
    RECORD_SECONDS = time  # 录音时间
    WAVE_OUTPUT_FILENAME = "./output.wav"  # 保存音频路径
    p = pyaudio.PyAudio()  # 创建PyAudio对象
    stream = p.open(format=FORMAT,  # 采样生成wav文件的正常格式
                    channels=CHANNELS,  # 音轨数
                    rate=RATE,  # 采样率
                    input=True,  # Ture代表这是一条输入流,False代表这不是输入流
                    frames_per_buffer=CHUNK)  # 每个缓冲多少帧
    print("* recording")  # 开始录音标志
    frames = []  # 定义frames为一个空列表

Insert image description here
8. Calculate the sound data and prepare to save the real-time sound data to the list.

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):  # 计算要读多少次,每秒的采样率/每次读多少数据*录音时间=需要读多少次
        data = stream.read(CHUNK)  # 每次读chunk个数据
        frames.append(data)  # 将读出的数据保存到列表中
    print("* done recording")  # 结束录音标志

    stream.stop_stream()  # 停止输入流
    stream.close()  # 关闭输入流
    p.terminate()  # 终止pyaudio

Insert image description here
9. Save the sound data saved in the list in binary form in a wav sound file.

wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')  # 以’wb‘二进制流写的方式打开一个文件
    wf.setnchannels(CHANNELS)  # 设置音轨数
    wf.setsampwidth(p.get_sample_size(FORMAT))  # 设置采样点数据的格式,和FOMART保持一致
    wf.setframerate(RATE)  # 设置采样率与RATE要一致
    wf.writeframes(b''.join(frames))  # 将声音数据写入文件
    wf.close()  # 数据流保存完,关闭文件

Insert image description here
10. Continue to define a main function. The function of this function is to load the base speech model (tip: the speech model will be automatically downloaded), and translate the real-time speech into Chinese for text output.

def main():
    model = whisper.load_model("base")
    record(5)  # 定义录音时间,单位/s
    result = model.transcribe("output.wav", language='Chinese', fp16=False)
    s = result["text"]
    s1 = zhconv.convert(s, 'zh-cn')
    print(s1)

Insert image description here
11. Finally, write a running entry and run the main function.

if __name__ == '__main__':
    main()

Insert image description here

12. Run the "speech recognition.py" file,
Insert image description here

13. After running, say "What should I do if I can't sleep at night" to check the effect.
Insert image description here
14. Complete code display

import whisper
import zhconv
import wave  # 使用wave库可读、写wav类型的音频文件
import pyaudio  # 使用pyaudio库可以进行录音,播放,生成wav文件

def record(time):  # 录音程序
    # 定义数据流块
    CHUNK = 1024  # 音频帧率(也就是每次读取的数据是多少,默认1024)
    FORMAT = pyaudio.paInt16  # 采样时生成wav文件正常格式
    CHANNELS = 1  # 音轨数(每条音轨定义了该条音轨的属性,如音轨的音色、音色库、通道数、输入/输出端口、音量等。可以多个音轨,不唯一)
    RATE = 16000  # 采样率(即每秒采样多少数据)
    RECORD_SECONDS = time  # 录音时间
    WAVE_OUTPUT_FILENAME = "./output.wav"  # 保存音频路径
    p = pyaudio.PyAudio()  # 创建PyAudio对象
    stream = p.open(format=FORMAT,  # 采样生成wav文件的正常格式
                    channels=CHANNELS,  # 音轨数
                    rate=RATE,  # 采样率
                    input=True,  # Ture代表这是一条输入流,False代表这不是输入流
                    frames_per_buffer=CHUNK)  # 每个缓冲多少帧
    print("* recording")  # 开始录音标志
    frames = []  # 定义frames为一个空列表
    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):  # 计算要读多少次,每秒的采样率/每次读多少数据*录音时间=需要读多少次
        data = stream.read(CHUNK)  # 每次读chunk个数据
        frames.append(data)  # 将读出的数据保存到列表中
    print("* done recording")  # 结束录音标志

    stream.stop_stream()  # 停止输入流
    stream.close()  # 关闭输入流
    p.terminate()  # 终止pyaudio

    wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')  # 以’wb‘二进制流写的方式打开一个文件
    wf.setnchannels(CHANNELS)  # 设置音轨数
    wf.setsampwidth(p.get_sample_size(FORMAT))  # 设置采样点数据的格式,和FOMART保持一致
    wf.setframerate(RATE)  # 设置采样率与RATE要一致
    wf.writeframes(b''.join(frames))  # 将声音数据写入文件
    wf.close()  # 数据流保存完,关闭文件
def main():
    model = whisper.load_model("base")
    record(5)  # 定义录音时间,单位/s
    result = model.transcribe("output.wav", language='Chinese', fp16=False)
    s = result["text"]
    s1 = zhconv.convert(s, 'zh-cn')
    print(s1)

if __name__ == '__main__':
    main()

Insert image description here
15. Finally, students who successfully run it remember to click three times in a row! See you in the comments section if you have any questions!
Insert image description here

Guess you like

Origin blog.csdn.net/liaoqingjian/article/details/132971629