environment
ubuntu 18.04 64bit
Nvidia GTX 1070Ti 8G
Introduction
Tortoise
is an open source Text-To-Speech
program with powerful text-to-speech capabilities and highly realistic voice and intonation.
to build
Create a brand new python
virtual environment
conda create -n tts python=3.8
conda activate tts
Then, pull the source code and install dependencies
git clone https://github.com/neonbjb/tortoise-tts.git
cd tortoise-tts
pip install -r requirements.txt
python setup.py install
test
Convert a single sentence of text to speech
python tortoise/do_tts.py --text "I'm going to speak this" --voice random --preset fast
After the script is executed successfully, 3 audio files results
will be wav
, and the sounds are randomly matched
All available sounds in the current system are stored tortoise/voices
under . If you like someone’s voice, you can specify it in the script parameters, and train_
the effect of the beginning will be better.
python tortoise/do_tts.py --text "I'm going to speak this" --voice tom --preset fast
If you have a lot of text to process, you can put them in a text file, such as
Hello world.
Hello Rust.
Nice to meet you.
then execute the script
python tortoise/read.py --textfile test.txt --voice random
The script breaks down the text file into individual sentences and converts each to speech. After all the statements are generated, combine them into one file and output
Finally, let's take a look at the performance of Chinese
python tortoise/do_tts.py --text "你好,世界" --voice random --preset fast
The generated effect is also too bad. Take a look at issues
https://github.com/neonbjb/tortoise-tts/issues/5. At present, other languages are not officially supported, so you need to train wav2vec
the model
custom sound
If you want to add a specific sound tortoise
to , you need the following steps
Collect audio clips of specific people
Organize the audio into small clips of about 10 seconds, at least 3 clips are needed, the more the better
The audio clip uses
wav
the format , sample rate 22050tortoise/voices
Create a new folder under the directory , name it with the name of the voice person, for easy memory, for examplezhangsan
, then copywav
the files into itThe final use is to
--voice
specify aszhangsan
Model download
During the running of the script, a bunch of model files will be downloaded from huggingface
the site , which are packaged here and stored in the cloud disk for self-collection
Link: https://pan.baidu.com/s/1EJD4N2yamDNh6X_0GtoaRQ
Extraction code:3qrq
After downloading, unzip and copy to the directory ~/.cache
, the file structure is as follows