Research Report on Dynamic Text-to-Speech Technology

Dynamic text-to-speech technology available offline

Dynamic text-to-speech technology refers to technology that converts text into speech. In an offline environment, a local speech synthesis engine is usually used to implement dynamic text-to-speech. Here are some commonly used techniques available offline:

  • eSpeak: eSpeak is a lightweight, open source speech synthesis engine that can be used on platforms such as Linux and Windows. It supports multiple languages ​​and sound styles, and can be invoked through the command line.
  • Festival: Festival is an open source speech synthesis engine that supports multiple languages ​​and voice styles. It is available on platforms such as Linux and Windows, but requires additional voice data to be installed.
  • MaryTTS: MaryTTS is an open source speech synthesis engine that supports multiple languages ​​and voice styles. It is available on platforms such as Linux and Windows, but requires additional voice data to be installed. MaryTTS also does not support Chinese, which is the response from the development team on GitHub.

Solution 1: espeak and espeak-ng command line calls

espeak is a lightweight speech synthesis engine that can be used on platforms such as Linux and Windows. It supports multiple languages ​​and sound styles, and can be invoked through the command line.

espeak command line call:

The espeak command line call can convert text to speech with simple commands. For example, use the following command to convert the text "Hello World" to a speech file:

espeak “Hello World” -w hello.wav

espeak-ng command line call:

espeak-ng is an enhanced version of espeak, which supports more languages ​​and sound styles, and provides more command line options. Compared to espeak, espeak-ng has better speech synthesis, but its file size is larger. Here is a sample code to convert text to speech using espeak-ng command line:

espeak-ng “Hello World” -w hello.wav

Advantages and disadvantages:

Advantages:
espeak and espeak-ng command line calls have the advantages of simplicity, lightweight, and ease of use. They run on multiple platforms and support multiple languages ​​and sound styles.

Disadvantages:
The speech synthesis effect of espeak and espeak-ng command line calls is average, not as good as that of commercial speech synthesis engines. At the same time, they do not support advanced features of speech synthesis, such as pitch shifting, speech rate control, etc.

Solution 2: jacob calls the Microsoft HUIHUI voice library

jacob is a bridge between Java and COM technology, which allows Java applications to call COM components. The Microsoft HUIHUI Speech Library is a speech synthesis engine on the Microsoft Windows platform that supports multiple languages ​​and voice styles.

The following is a sample code for using jacob to call the Microsoft HUIHUI speech library:

ActiveXComponent ax = new ActiveXComponent(“Sapi.SpVoice”);

Dispatch spVoice = ax.getObject();

Dispatch.call(spVoice, “Speak”, new Variant(“Hello World”));

Advantages and disadvantages:

Advantages: jacob calls Microsoft HUIHUI speech library for speech synthesis effect is very good, can achieve high-quality speech synthesis, and supports multiple languages ​​and voice styles. At the same time, jacob can implement advanced functions of speech synthesis in Java applications, such as pitch change, speech rate control, etc.

Disadvantages: jacob calls the Microsoft HUIHUI voice library and needs to run on the Windows platform, and does not support running on other platforms such as Linux. At the same time, jacob requires additional configuration and installation, which is relatively complicated to use.

Final choice and why

We choose to call espeak from the command line on Linux, and use jacob to call the Microsoft HUIHUI voice library on Windows, mainly for the following reasons:

  • Lightweight and simple: espeak is a lightweight speech synthesis engine that can be used on platforms such as Linux and Windows, and can be invoked through the command line. It is very simple to use and requires no additional configuration and installation. And jacob calls the Microsoft HUIHUI voice library to achieve high-quality speech synthesis in Java applications, while supporting advanced functions of speech synthesis.

  • Cross-platform support: By using espeak and jacob, we can implement dynamic text-to-speech on multiple platforms. espeak can run on platforms such as Linux and Windows, and jacob calls the Microsoft HUIHUI speech library to run on Windows platforms.

  • Free and Open Source: Both espeak and the Microsoft HUIHUI Speech Library are free, open source speech synthesis engines that are free to use. At the same time, using these engines also has better controllability and customizability.

  • Good speech synthesis effect: Both espeak and jacob call the Microsoft HUIHUI speech library to achieve good speech synthesis effect, and support multiple languages ​​and voice styles. By using these two technologies, we can choose the appropriate speech synthesis engine and sound style according to different needs.

Generally speaking, we choose to call espeak on the command line on Linux, and use jacob to call the Microsoft HUIHUI voice library on Windows, in order to achieve lightweight, cross-platform, good speech synthesis effects, and support advanced functions of speech synthesis .

Supongo que te gusta

Origin blog.csdn.net/qq_40421671/article/details/130689377
Recomendado
Clasificación