Share | OpenCV4.5.4 speech recognition test (including detailed steps)

Click the card below to follow the public account of " OpenCV and AI Deep Learning "!

Visual / image heavy dry goods, delivered as soon as possible!

Guided reading

This article mainly shares the use (verification) and precautions of speech recognition examples in OpenCV4.5.4.

Background introduction

The DNN module of OpenCV4.5.4 has added support for speech recognition. This article uses the Python version as an example for verification.

Steps for usage

Python-OpenCV instance code location: OpenCV4.5.4_Release\opencv\sources\samples\dnn\speech_recognition.py

picture

Steps for usage:

[1] Download the speech recognition model:

https://drive.google.com/drive/folders/1wLtxyao4ItAg8tt4Sb63zt6qXzhcQoR6

picture

Download the model jasper_reshape.onnx, then rename it to: jasper.onnx, put it in the same directory as the py file

【2】Download the test audio:

Download audio6.flac and audio6.flac as shown in the figure above. The preliminary test found that the program does not support mp3 format audio, and needs to be converted to flac or wav format. Other formats have not been tried yet.

[3] Install the soundfile package:

Just pip install soundfile.

[4] Run the cmd command line:

python speech_recognition.py --input_audio=./audio/audio6.flac

audio6.flac audio: 00:00 / 00:11

audio6.flac recognition result:

picture

Predicting...Audio file 1/1['an american instead of going in a leisure hour to dance merrily at some place of public resort as the fellows of his calling continued to do throughout the greater part of europe shuts himself up at home to drink']

audio10.flac audio: 00:00 / 00:27

audio10.flac recognition result:

picture

Predicting...Audio file 1/1['she opened the door softly there sat missus wilson in the old rocking chair with one sick death like boy lying on her knee crying without let or pause but softly gently as fearing to disturb the troubled gasping child while behind her old alice let her fast dropping tears fall down on the dead body of the other twin which she was laying out on a board placed on a sort of sofa settee in the corner of the room']

The above two audio recognition results are not bad. Note that this model does not support Chinese recognition. Try two English audios:

The first audio: https://www.tingclass.net/show-5406-3632-1.html

picture

python speech_recognition.py --input_audio=./audio/CET4.wav

Recognition result:

Predicting...Audio file 1/1['o hom m bell amo hn haha am o waa iha  me howa e al ru e  hi hera morbo ao ha yur you move fore hung mo by wholl hab your hu mo ah  miseur luuel u lonlur wole olla iwer home all  bou o how bu olur aa men he ul um aha ol a oh a he notn ol all hole ar rule sa mer peaile hall her orha ah be a hen hom all murn a bown lok ano gerl orhehan or holy mule i ea the lol and theyn whole mon wingle all form ']

Uh, it's very different from the actual result, and many words in the result are incomprehensible.

Change to another audio: https://m.kekenet.com/Article/201504/369129.shtml

picture

python speech_recognition.py --input_audio=./audio/english.wav

Recognition result:

Predicting...Audio file 1/1[" shakish am am shut shash an shi hang ca iunkun usha y oru u warm room  wo o emon o  chjonnoe e  ah wo an o a hush e i've o ask rule ur o sqawe grewh ula u ho a o ah"]

This audio recognition result is still very poor.

The preliminary analysis should be that the audio during model training is quite different from the audio we tested. If you want to get good recognition results, you have to train yourself. The routine code speech_recognition.py also contains the download address of the pre-training model. If you are interested, you can try it yourself. If there is any new news, please share it with everyone!

Guess you like

Origin blog.csdn.net/stq054188/article/details/121981613