With the continuous development of speech technology, speech recognition technology has gradually matured and has become an important part of many smart applications, such as smart home, voice assistant and so on. In speech recognition technology, Chinese speech recognition is a more challenging field. In order to facilitate Chinese speech recognition for programmers, here are ten Python open source Chinese speech-to-text projects recommended, hoping to be helpful to everyone.
wax
vosk is a lightweight speech recognition library that supports multiple languages, including Chinese. It uses deep learning technology to complete speech-to-text tasks in a relatively short period of time. The advantage of vosk is that it is fast, accurate and can be used offline. Github link: https://github.com/alphacep/vosk-api
Kaldi-python
Kaldi-python is a Kaldi-based Python speech recognition toolkit that supports multiple languages, including Chinese. Kaldi is a very popular speech recognition engine, and its recognition accuracy is very high. With Kaldi-python, you can easily use Kaldi functions in Python. Github link: https://github.com/janchorowski/kaldi-python
PocketSphinx
PocketSphinx is an open source speech recognition toolkit by CMU Sphinx, which supports multiple languages, including Chinese. It is a lightweight speech recognition engine that can be used in resource-constrained environments such as mobile devices. Github link: https://github.com/cmusphinx/pocketsphinx
py-kaldi-asr
py-kaldi-asr is a Kaldi-based Python speech recognition toolkit that supports multiple languages, including Chinese. Different from Kaldi-python, py-kaldi-asr provides a more advanced API and supports functions such as multi-thread recognition. Github link: https://github.com/jpuigcerver/py-kaldi-asr
Assemblyai
Assemblyai is a speech recognition API that uses deep learning technology and supports multiple languages, including Chinese. It uses an algorithm called "adaptive density comparison", which can complete the speech-to-text task in a relatively short period of time. Github link: https://github.com/assemblyai/python-sdk
Google Cloud Speech-to-Text
Google Cloud Speech-to-Text is a speech recognition API on the Google Cloud platform that supports multiple languages, including Chinese. It uses Google's own speech recognition engine, which can achieve a high level of accuracy. Github link: https://github.com/googleapis/python-sdk
Baidu AI Open Platform
The Baidu AI Open Platform provides a speech recognition API that supports multiple languages, including Chinese. It uses Baidu's own speech recognition engine, which can achieve a high level of accuracy. It also supports offline speech recognition and real-time speech recognition. Github link: https://github.com/Baidu-AIP/python-sdk
iFLYTEK
iFLYTEK is a speech recognition API launched by iFLYTEK, which supports multiple languages, including Chinese. It uses deep learning techniques and can achieve a high level of accuracy. It also supports offline speech recognition and real-time speech recognition. Github link: https://github.com/iFLYTEK-Speech/python_sdk
DeepSpeech
DeepSpeech is Mozilla's open source speech recognition toolkit, which supports multiple languages, including Chinese. It uses deep learning techniques and can achieve a high level of accuracy. Its advantage is that it can be used offline, and it also provides a pre-trained Chinese speech recognition model. Github link: https://github.com/mozilla/DeepSpeech
vosk-api-python
vosk-api-python is vosk's Python speech recognition toolkit, which uses deep learning technology to complete speech-to-text tasks in a relatively short period of time. Unlike vosk, it provides a more advanced API and supports functions such as multi-thread recognition. Github link: https://github.com/alphacep/vosk-api/tree/master/python