Introduction to Speech Open Source Code

1. Left

        Kaldi was born at Johns Hopkins University in 2009. At the beginning, the project focused on subspace Gaussian model (SGMM) modeling and vocabulary learning sampling survey. The code was also developed based on HTK, and now C++ is the main language. But with the addition of more participants, especially the support for deep neural networks (DNNs), Kaldi's development has surpassed several other well-known open source projects. More importantly, Kaldi is maintained and updated in a very timely manner, with new progress reports basically every day, and it is also faster in following up on new algorithms for academic research. Many foreign companies and research institutions are also using the Kaldi platform. Of course, more domestic companies are actually making improvements based on this platform, especially emerging companies and corporate research institutes in recent years.


       CMU-Sphinx is an open source speech recognition system developed by Carnegie Mellon University (CMU) and later developed by Sun, Mitsubishi, Hewlett-Packard, UC Santa Cruz and MIT. contribute. Sphinx includes a series of speech recognizers and acoustic model training tools, using a fixed HMM model (the Institute of Acoustics of the Chinese Academy of Sciences has also led the domestic trend of HMM), and is known as the first high-performance continuous speech recognition system. The development of Sphinx is also very fast, and now Sphinx-4 has been completely rewritten in Java language, which is very suitable for embedding into the Android platform. In addition, the author would like to emphasize the contribution of Mr. Kai-Fu Lee to Sphinx, although there are many arguments.


      Julius is a practical and efficient dual-channel large-vocabulary continuous speech recognition engine jointly developed by Kyoto University and IPA (Information-tech-nology Promotion Agency). Julius can easily build a speech recognition system by combining language model and acoustic model. Language models supported by Julius include: N-gram models, rule-based grammars, and simple word lists for isolated word recognition. The acoustic models supported by Julius must be word-partitioned and defined by the HMM. Julius is developed by pure C language and follows the GPL open source protocol. The latest version of Julius adopts the modular design idea, so that each functional module can be configured by parameters.


      HTK is the abbreviation of Hidden Markov ModelToolkit (Hidden Markov Model Toolkit). HTK was originally developed by the Machine Intelligence Laboratory of Cambridge University Engineering Department (CUED) in 1989. It was used to build CUED's large vocabulary speech recognition system. HTK mainly includes speech feature extraction and analysis tools, model training tools, and speech recognition tools. HTK was actually acquired by Microsoft in 1999, but this hindered the development of HTK, so Microsoft later authorized open source. The version update of HTK is quite slow, finally released its 3.5 Beta version in 2015.


       RWTH ASR is an acoustic model development package containing decoders and tools for speech recognition, developed in 2001 by the Human Language Technology and Pattern Recognition Group at RWTH Aachen University. RWTH ASR is also developed by C++, which mainly includes speaker adaptation components, speaker adaptation training components, unsupervised training components, personalized training and word root processing components.

       The above 5 kinds of speech recognition open source codes are the basic open source versions. Based on these versions, many derivative versions have been born, such as Platypus, FreeSpeech, Vedics, NatI, Simon, Xvoice, Zanzibar, OpenIVR, Dragon Naturally Speaking, etc. Among them, Dragon Naturally Speaking, etc. Interestingly, Naturally Speaking was acquired by Nuance as its product name.

Reprinted from:


Guess you like