A text-to-speech tool

environment

  • ubuntu 18.04 64bit

  • Nvidia GTX 1070Ti 8G

Introduction

Tortoiseis an open source Text-To-Speechprogram with powerful text-to-speech capabilities and highly realistic voice and intonation.

to build

Create a brand new pythonvirtual environment

conda create -n tts python=3.8
conda activate tts

Then, pull the source code and install dependencies

git clone https://github.com/neonbjb/tortoise-tts.git
cd tortoise-tts
pip install -r requirements.txt
python setup.py install

test

Convert a single sentence of text to speech

python tortoise/do_tts.py --text "I'm going to speak this" --voice random --preset fast

After the script is executed successfully, 3 audio files resultswill be wav, and the sounds are randomly matched

All available sounds in the current system are stored tortoise/voicesunder . If you like someone’s voice, you can specify it in the script parameters, and train_the effect of the beginning will be better.

python tortoise/do_tts.py --text "I'm going to speak this" --voice tom --preset fast

If you have a lot of text to process, you can put them in a text file, such as

Hello world.
Hello Rust.
Nice to meet you.

then execute the script

python tortoise/read.py --textfile test.txt --voice random

The script breaks down the text file into individual sentences and converts each to speech. After all the statements are generated, combine them into one file and output

Finally, let's take a look at the performance of Chinese

python tortoise/do_tts.py --text "你好,世界" --voice random --preset fast

The generated effect is also too bad. Take a look at issueshttps://github.com/neonbjb/tortoise-tts/issues/5. At present, other languages ​​are not officially supported, so you need to train wav2vecthe model

custom sound

If you want to add a specific sound tortoiseto , you need the following steps

  • Collect audio clips of specific people

  • Organize the audio into small clips of about 10 seconds, at least 3 clips are needed, the more the better

  • The audio clip uses wavthe format , sample rate 22050

  • tortoise/voicesCreate a new folder under the directory , name it with the name of the voice person, for easy memory, for example zhangsan, then copy wavthe files into it

  • The final use is to --voicespecify aszhangsan

Model download

During the running of the script, a bunch of model files will be downloaded from huggingfacethe site , which are packaged here and stored in the cloud disk for self-collection

Link: https://pan.baidu.com/s/1EJD4N2yamDNh6X_0GtoaRQ
Extraction code:3qrq

After downloading, unzip and copy to the directory ~/.cache, the file structure is as follows

7406065332d0de79c036dae81a8d34c0.png



Guess you like

Origin blog.csdn.net/djstavaV/article/details/129360201
Recommended