Recently saw an open source project, specially studied under, after the measurement, the accuracy of the speech recognition system is probably about 75%, as learning data entry is good, the project has been uploaded to github on, but the data sets and models generated due file is too large to upload fails, then there is Baidu network disk, to download Ha, really hurt ordinary computer, running for three days, or gpu hurry.
View Wiki documentation of the project
If you have any questions during program operation or use, can be raised in a timely issue, I will respond as soon as possible. The project of the exchange QQ group: 867 888 133
The question before you can first view the FAQ avoid duplication of questions
ASRT principles, please see this article:
Questions frequently asked about the statistical theory of language models, see:
- Statistical language models: the text from the Chinese Pinyin
- No need to divide Chinese word segmentation algorithm simple word frequency statistics
Introduction Introduction
This project uses Keras, TensorFlow neural network-based convolution depth and length of the neural network memory, attention and CTC mechanism to achieve.
This project uses Keras, TensorFlow based on deep convolutional neural network and long-short memory neural network, attention mechanism and CTC to implement.
- Steps
First, by Git clone this item to your computer and then download the project training required for data collection, download links see the end of the document section .
$ git clone https://github.com/nl8590687/ASRT_SpeechRecognition.git
Or you can also "Fork" button, a copy of the project Copy, then by your own SSH keys to a local clone.
After git repository by cloning into the project root directory; and create a subdirectory dataset/
(using soft links in place), and then extract the downloaded data sets directly into
Note that the current version, Thchs30 and ST-CMDS both data sets must be downloaded to use, indispensable, and other data sets need to modify the code.
$ CD ASRT_SpeechRecognition
$ mkdir DataSet
$ zxf the tar <archive data set name > -C dataset /
You then need to file in the directory datalist all copied to dataset/
the directory, that is, put it together with the data set.
$ cp -rf datalist/* dataset/
Currently available models have 24, 25 and 251
Before you run the project, install the necessary Python3 version of dependent libraries
The project began training do:
$ python3 train_mspeech.py
The project began to perform the test:
$ python3 test_mspeech.py
Before testing, make sure you fill in the code model file path exists.
ASRT API server startup issue:
$ python3 asrserver.py
Please note that after open API server, you need to use the ASRT project corresponding client software for voice recognition, see Wiki document ASRT Client Demo .
If you want to train and use the model 251, in the code import SpeechModel
corresponding position to make changes.
Model Model
Speech Model speech model
CNN + LSTM/GRU + CTC
Wherein the maximum length of time of the audio input 16 seconds, the output of the corresponding sequence of Pinyin
- Download the issue has been trained model
In this warehouse can Github releases compressed bag inside each view the published version of the software has been included to get good training model parameters complete source code.
Language Model Language Model
Based on maximum entropy probability map of Hidden Markov Model
Input Pinyin sequence, the output of the corresponding Chinese text
About Accuracy on accuracy
Currently, the best model on the test set substantially to achieve 80% of the correct Pinyin
However, due to the current international and domestic part of the team can do 98%, so the correct rate is still needs to be improved
Python Import
Python library dependencies
- python_speech_features
- TensorFlow
- Hard
- Numpy
- wave
- matplotlib
- math
- Scipy
- h5py
- http
- urllib
Data Sets Data sets
-
Tsinghua University THCHS30 Chinese voice data set
data_thchs30.tgz OpenSLR domestic mirroring OpenSLR foreign Mirror
noise.tgz-the Test OpenSLR domestic mirroring OpenSLR foreign Mirror
resource.tgz OpenSLR domestic mirroring OpenSLR foreign Mirror
-
Free ST Chinese Mandarin Corpus
-CMDS-20170001_1-ST OS.tar.gz OpenSLR domestic mirroring OpenSLR foreign Mirror
-
AIShell-1 open-source version of the data set
data_aishell.tgz OpenSLR domestic mirroring OpenSLR foreign Mirror
Note: The data decompression method set
$ tar xzf data_aishell.tgz $ cd data_aishell/wav $ for tar in *.tar.gz; do tar xvf $tar; done
-
Primewords Chinese Corpus Set 1
primewords_md_2018_set1.tar.gz OpenSLR domestic mirroring OpenSLR foreign Mirror
-
aidatatang_200zh
aidatatang_200zh.tgz OpenSLR domestic mirroring OpenSLR foreign Mirror
-
MagicData
train_set.tar.gz OpenSLR domestic mirroring OpenSLR foreign Mirror
dev_set.tar.gz OpenSLR domestic mirroring OpenSLR foreign Mirror
test_set.tar.gz OpenSLR domestic mirroring OpenSLR foreign Mirror
metadata.tar.gz OpenSLR domestic mirroring OpenSLR foreign Mirror
Special thanks! In gratitude for their public voice data set
If the data set and can not open the link provided to download, please click on the link OpenSLR
We tested a voice: "We are all brothers and sisters in the group said," overall okay, practical application there is a gap! ! !