Three steps of Cross-Lingual TTS based on RTVC-7 Voice Cloning Model: The second step is Tuned-GE2E-SayEN-EarSpeech

0. Description

1. Git Clone

2. Training data

2.1. VCTK

Used to train AutoVC before

  • The logic of processing data starts from: /ceph/home/hujk17/Tuned-GE2E-SayEN-EarSpeech/FaPig_extract_GE2E_VCTK_nosli.py
  • Keep train, val, unseen. Only use train when training

3. speaker embedding

It is also extracted in 2.1., using GE2E

4. Preprocess the data -> mel

It was also extracted in 2.1., using the mel in the laboratory. Hey (●ˇ∀ˇ●), finally convenient

5. Change the code

  • The symbols don’t need to be moved, they retain the English punctuation.
  • Need to change the path of train.txt
  • Take Kiss as the model name, and the logic starts from Kiss_train.py
  • Use the original small batch to train directly, batch_size = 12

6. GE2E version logic

  • 256 Embedding is still followed by an FC, which is consistent with Chinese
  • Chinese is called FaPig, English is called Kiss, and the logic starts from Kiss_train.py
  • from synthesizer.FaPig_train import train and from synthesizer.Kiss_train import train are actually exactly the same, just to unify the format, a copy

Guess you like

Origin blog.csdn.net/u013625492/article/details/114868864