5 Paragraph good open source voice recognition / text-to-speech systems | Linux China

640?wx_fmt=png Text-to-speech (STT) system as its name implied meaning as a convert spoken words into text file for subsequent use of the way. - Simon James

Text-to-speech speech-to-text (STT) system as its name implied meaning as a spoken word will be converted to a text file for subsequent use of the way.

Text-to-speech technology is very useful. It can be used in many applications, such as automatic transcription, use your own voice or text to write books, do complex analysis and text files generated by other tools and so on.

In the past, text-to-speech technology to proprietary software and database-driven, or not open source alternatives, or have strict restrictions, there is no community. This is changing, today there are many open source text-to-speech tools and libraries that allow you to use at any time.

Here I list 5.

Open Source Speech Recognition Library

DeepSpeech project

640?wx_fmt=png

The project is developed by the development organization Mozilla Firefox browser team. It is 100% free and open source software, which uses the name implies TensorFlow machine learning framework to implement the function.

In other words, you can use it to train their model to get better results, you can even use it to convert other languages. You also can easily integrate it into their own Tensorflow machine learning project. Unfortunately, the current default program is only available in English.

It also supports many programming languages, such as Python (3.6). It allows you to complete the work in a matter of seconds:

 
  
pip3 install deepspeech
deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav--model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav

You can also  npm install it:

 
  
npm install deepspeech
◈  Project Home

Kaldi

640?wx_fmt=png

Kaldi written in C ++ is a open-source speech recognition software, and released under the Apache Public License. It can run on Windows, macOS and Linux. Its development began in 2009.

Kaldi more than any other speech recognition software main feature is the scalable and modular. Community offers a large number of third-party modules can be used to complete your mission. Kaldi also supports deep neural networks, and provides on its website an excellent document .

Although the code is completed mainly by C ++, but it is encapsulated by Bash and Python scripts. So, if you want to use only basic voice to text conversion, you will find that can easily be achieved by Python or Bash.

◈  Project Home

Julius

640?wx_fmt=png

It may be one of the oldest-ever speech recognition software. Its development began in Kyoto University in 1991, later in 2005, transferring ownership to an independent team.

The main features of Julius include the ability to perform real-time STT, low memory footprint (20,000 words less than 64 MB), can output the best word N-best word and word graph Word-graph, can run as a server unit and so on. This software is designed primarily for academic and research institutes. Written in C language and can run on Linux, Windows, macOS even Android (in smart phones).

It currently supports only English and Japanese. You should be able to easily install software from a Linux distribution warehouse. As long as you can in search julius package manager. The latest version released in about a half months before the article before publication.

◈  Project Home

Wav2Letter++

640?wx_fmt=png

If you are looking for a more stylish, then this certainly fit. Wav2Letter ++ is a released before two months of AI research team from the Facebook open source speech-recognition software. Code released under BSD license.

Facebook describes its library is "the fastest, most advanced state-of-the-art speech recognition system." When the idea of building it make it the default optimized for performance. Facebook latest machine learning library  FlashLight  also be used as the underlying core Wav2Letter ++.

Wav2Letter ++ requires you to first establish a model for the language to describe the training algorithm. Does not have any language (including English) pre-training model, it is just a machine learning-driven text-to-speech tool written in C ++, hence the name Wav2Letter ++.

◈  Project Home

DeepSpeech2

640?wx_fmt=png

Chinese software giant Baidu researchers are also developing their own text-to-speech engine, called "DeepSpeech2". It is an end-to-open-source engine that uses "PaddlePaddle" deep learning framework English or Chinese text conversion. Code released under BSD license.

The engine can be trained on any model you want and in any language. Model has not released with the code. You have to establish yourself as model like other software. DeepSpeech2 source code is written in Python, if you have used will be very easy to use.

◈  Project Home

to sum up

Speech recognition is still largely dominated by the proprietary software giants, such as Google and IBM (which aims to provide a closed-source commercial services), but the same kind of open source software promising. This 5 open source speech recognition engine should be able to help you build applications, over time, they will continue to develop. In a few years, we expect revenue to be the norm these techniques, just like other industries that.

If you have other suggestions or comments on the list, we'd love to hear below.


via: https://fosspost.org/lists/open-source-speech-recognition-speech-to-text

Author: Simon James  topics: lujun9972  Translator: LuuMing  proofread: wxy

This article from the  LCTT  original compiler, Linux China  is proud

640?wx_fmt=jpeg


Guess you like

Origin blog.csdn.net/F8qG7f9YD02Pe/article/details/93377391