How to use Python to convert speech to text, do you know?

Speech recognition is the ability of computer software to recognize words and phrases in spoken language and convert them into readable text. So how to convert speech to text in Python? How to use SpeechRecognition library to convert speech to text in Python? We don't need to build any machine learning models from scratch. The library provides us with convenient packaging of various well-known public speech recognition APIs.

Use pip to install the library:

pip3 install SpeechRecognition

Okey, open a new Python file and import it:

import speech_recognition as sr

Read from file

Make sure there is an audio file containing English voice in the current directory (if you want to study with me, please get the audio file here):

filename = “speech.wav”

The file is obtained from the LibriSpeech dataset, but you can bring whatever you want, just change the file name to initialize the speech recognizer:

# initialize the recognizer

r = sr.Recognizer()

The following code is responsible for loading the audio file and using Google Speech Recognition to convert speech to text:

# open the filewith sr.AudioFile(filename) as source:www.zpedu.com/

# listen for the data (load audio to memory)

audio_data = r.record(source)

# recognize (convert from speech to text)

text = r.recognize_google(audio_data)

print(text)

This takes a few seconds to complete because it uploaded the file to Google and got the output, this is my result:

I believe you’re just talking nonsense

Read from microphone

This requires PyAudio to be installed on your computer, the following is the installation process depending on your operating system:

Windows

You can click to install it:

pip3 install pyaudio

Linux

You need to install the dependencies first:

sudo apt-get install python-pyaudio python3-pyaudio

Apple system

You need to install portaudio before you can click to install it:

brew install portaudio

Now, let's use the microphone to convert the voice:

with sr.Microphone() as source:

# read the audio data from the default microphone

audio_data = r.record(source, duration=5)

print(“Recognizing…”)

# convert speech to text

This will hear from your microphone for 5 seconds and then try to convert the speech to text!

It is very similar to the previous code, but here we use the Microphone () object to read audio from the default microphone, and then use the duration parameter in the record () function to stop reading after 5 seconds, and then upload the audio data to Google to get the output text.

You can also use the offset parameter in the record () function to start recording after a few seconds of offset. If you are interested in Python, you can add the teacher's WeChat: abb436574, get a set of learning materials and video courses for free~

In addition, you can identify different languages ​​by passing the language parameter to the accept_google () function. For example, if you want to recognize Spanish speech, you can use:

text = r.recognize_google(audio_data, language=”es-ES”)

Guess you like

Origin blog.csdn.net/weixin_45820912/article/details/108377764