Chinese speech recognition system based on the depth of learning

Recently saw an open source project, specially studied under, after the measurement, the accuracy of the speech recognition system is probably about 75%, as learning data entry is good, the project has been uploaded to github on, but the data sets and models generated due file is too large to upload fails, then there is Baidu network disk, to download Ha, really hurt ordinary computer, running for three days, or gpu hurry.

View Wiki documentation of the project

If you have any questions during program operation or use, can be raised in a timely issue, I will respond as soon as possible. The project of the exchange QQ group: 867 888 133

The question before you can first  view the FAQ  avoid duplication of questions

ASRT principles, please see this article:

Questions frequently asked about the statistical theory of language models, see:

Introduction Introduction

This project uses Keras, TensorFlow neural network-based convolution depth and length of the neural network memory, attention and CTC mechanism to achieve.

This project uses Keras, TensorFlow based on deep convolutional neural network and long-short memory neural network, attention mechanism and CTC to implement.

  • Steps

First, by Git clone this item to your computer and then download the project training required for data collection, download links see the end of the document section .

$ git clone https://github.com/nl8590687/ASRT_SpeechRecognition.git

Or you can also "Fork" button, a copy of the project Copy, then by your own SSH keys to a local clone.

After git repository by cloning into the project root directory; and create a subdirectory  dataset/ (using soft links in place), and then extract the downloaded data sets directly into

Note that the current version, Thchs30 and ST-CMDS both data sets must be downloaded to use, indispensable, and other data sets need to modify the code.

$ CD ASRT_SpeechRecognition 

$ mkdir DataSet 

$ zxf the tar <archive data set name > -C dataset /

You then need to file in the directory datalist all copied to  dataset/ the directory, that is, put it together with the data set.

$ cp -rf datalist/* dataset/

Currently available models have 24, 25 and 251

Before you run the project, install the necessary Python3 version of dependent libraries

The project began training do:

$ python3 train_mspeech.py

The project began to perform the test:

$ python3 test_mspeech.py

Before testing, make sure you fill in the code model file path exists.

ASRT API server startup issue:

$ python3 asrserver.py

Please note that after open API server, you need to use the ASRT project corresponding client software for voice recognition, see Wiki document ASRT Client Demo .

If you want to train and use the model 251, in the code  import SpeechModel corresponding position to make changes.

Model Model

Speech Model speech model

CNN + LSTM/GRU + CTC

Wherein the maximum length of time of the audio input 16 seconds, the output of the corresponding sequence of Pinyin

  • Download the issue has been trained model

In this warehouse can Github releases compressed bag inside each view the published version of the software has been included to get good training model parameters complete source code.

Language Model Language Model

Based on maximum entropy probability map of Hidden Markov Model

Input Pinyin sequence, the output of the corresponding Chinese text

About Accuracy on accuracy

Currently, the best model on the test set substantially to achieve 80% of the correct Pinyin

However, due to the current international and domestic part of the team can do 98%, so the correct rate is still needs to be improved

Python Import

Python library dependencies

  • python_speech_features
  • TensorFlow
  • Hard
  • Numpy
  • wave
  • matplotlib
  • math
  • Scipy
  • h5py
  • http
  • urllib

Data Sets Data sets

Special thanks! In gratitude for their public voice data set

If the data set and can not open the link provided to download, please click on the link  OpenSLR

We tested a voice: "We are all brothers and sisters in the group said," overall okay, practical application there is a gap! ! !

Guess you like

Origin www.cnblogs.com/chen8023miss/p/12082284.html