The most powerful text-to-speech tool

img

download

Link: https://pan.baidu.com/s/1cb24WW2dihtRpMz4giMxyw
Extraction code: k3xu
Decompression password: Navigator Weiniao

Project source code: https://github.com/Plachtaa/VITS-fast-fine-tuning/tree/main

use

After unzipping, put the prepared voice into a folder in this directory. It must be in wav format and the duration is usually one hour.

img

Let me tell you how I did it. I directly exported it in clipping wav format, and then used a cutting tool to cut it.

Link: https://pan.baidu.com/s/1ArPPTDmZpq75eHZsyaEnjA
Extraction code: 08zf
Decompression password: cuijiahua.com

Go to the audio segmentation directory and click to run the script

img

Come to the interface, enter the audio file path, and the split output path. Don’t touch anything else, just click start!

img

Then put the segmented data into the location mentioned before

img

Click on Preprocessing,

[img]Insert image description here

Just enter as instructed, y agrees, ge name, 0 does not use auxiliary data

After that, wait for the processing to be completed. After the processing is completed, it will display. Press any key to continue. Just x out of the terminal, click to start training, and enter the number of training rounds. You can enter 200 first. If the effect is not good, you can continue training. Do not click Start. Training will clear the weight file and continue training by inputting 300, which means training for 100 epochs based on the original 200.

img

Click to start reasoning, enter Chinese, click Generate, and download if you think it is OK.

img

Basically completed:

Reference: Navigator Weiniao

Guess you like

Origin blog.csdn.net/weixin_62403633/article/details/132527993