0. Description
- It is almost the same as the version of GE2E in https://blog.csdn.net/u013625492/article/details/114433738
- The difference is that the text is in English, and it is trained using English data sets such as VCTK
1. Git Clone
2. Training data
2.1. VCTK
Used to train AutoVC before
- The logic of processing data starts from: /ceph/home/hujk17/Tuned-GE2E-SayEN-EarSpeech/FaPig_extract_GE2E_VCTK_nosli.py
- Keep train, val, unseen. Only use train when training
3. speaker embedding
It is also extracted in 2.1., using GE2E
4. Preprocess the data -> mel
It was also extracted in 2.1., using the mel in the laboratory. Hey (●ˇ∀ˇ●), finally convenient
5. Change the code
- The symbols don’t need to be moved, they retain the English punctuation.
- Need to change the path of train.txt
- Take Kiss as the model name, and the logic starts from Kiss_train.py
- Use the original small batch to train directly, batch_size = 12
6. GE2E version logic
- 256 Embedding is still followed by an FC, which is consistent with Chinese
- Chinese is called FaPig, English is called Kiss, and the logic starts from Kiss_train.py
-
from synthesizer.FaPig_train import train and from synthesizer.Kiss_train import train are actually exactly the same, just to unify the format, a copy