Comparison of AutoVC and RTVC in Cross-lingual TTS

0. Description

  • When it comes to TTS, I always think of input as text. If you use timbre migration, you generally think of RTVC.
  • But the input is text, which will restrict the use of many corpus, after all, accurate annotation is required
  • If there is no label, you can use PPG, but it is also a middle way.
  • If you think PPG is not accurate, AutoVC's method is also OK, and not only the data is available, but the whole system is more "beautiful", for example, similar loss can be used.
  • Encoder can include the structure and code of the PPG extraction process, and at the same time increase the normal AutoVC structure, which is used as splicing/residual information to supplement PPG information

Guess you like

Origin blog.csdn.net/u013625492/article/details/115019773