Convert audio files to text through Python's speech_recognition library


foreword

Hello everyone, I am Kongkong star. In this article, I will share with you how to convert audio files to text through Python's speech_recognition library.
The previous article has introduced related libraries speech_recognition.

Introduction to Python-speech-to-text related libraries


1. Audio preparation

Here we generate a piece of audio through Mr. gTTS. For the introduction of gTTS, you can read the blog written by the blogger before.

Convert text to audio through Python's gtts library

from gtts import gTTS
local = '/Users/kkstar/Downloads/video/'
text = '大家好,我是空空star,本篇给大家分享一下音频转文字,这是通过speech_recognition转换的文字。'
language = "zh-cn"
tts = gTTS(text=text, lang=language)
tts.save(local+"audio_gtts_0509.mp3")

2. Audio sound

Audio to Text_0509

3. Format conversion

Convert mp3 to wav.
You can't just change the suffix here, you need to use an audio conversion tool to convert it.
audio_gtts_0509.mp3->audio_gtts_0509.wav

4. Audio to Text

1. Import library

import speech_recognition as sr

2. Define the audio path

local = '/Users/kkstar/Downloads/video/'

3. Create a Recognizer object

r = sr.Recognizer()

4. Open the audio file and read the audio file into the Recognizer object

Audio files must be in wav format

# 打开音频文件
with sr.AudioFile(local+'audio_gtts_0509.wav') as source:
    # 将音频文件读入Recognizer对象
    audio = r.record(source)

5. Try to use Google Web API to convert speech to text

try:
    text = r.recognize_google(audio, language='zh-CN')
    print('转换结果:', text)
except sr.UnknownValueError:
    print('无法识别语音')
except sr.RequestError as e:
    print('无法连接到Google Web API. {0}'.format(e))

6. Conversion result

转换结果: 大家好我是空空Store本篇给大家分享一下音频转文字这是通过Keep下划线recognition转换的文字

Process finished with exit code 0

Summarize

recognize_google: recognize_google() is a speech recognition API provided by Google, which can recognize audio files or speech recorded by a microphone and convert it into text. In Python, the API can be called using the recognize_google() method in the SpeechRecognition library.

Guess you like

Origin blog.csdn.net/weixin_38093452/article/details/130584997