2021年语音合成论文统计(1~3月)

论文统计每月第一周更新一次,主要跟踪语音合成的发展状况(很多文章都是在会议后才发出,但不影响统计。统计过程难免存在疏漏,因此统计结果仅供参考。读者有什么建议可以直接向我发消息,我将不断修改该统计。历年文章统计可访问 http://yqli.tech/page/tts_paper.html)。如有转载,请注明出处。欢迎关注微信公众号:低调奋进。


语音合成文章情况表(单位:篇)

    1月 2月 3月
前端 多音字,韵律,g2p等等。 1 0 0
声学模型 语言特征转声学特征,attention工作以及双重学习 1 7 5
声码器 波形生成 1 3 3
个性化 少数据,脏数据应用等 1 1 5
多语言 多语言多说话人模型 0 0 0
歌唱合成 歌唱和音乐合成 0 1 2
情感 风格和情感 2 2 0
多模态 talking head等等 2 1 1
声音转换 基于GAN方案和特征解耦方案 4 2 4
其它 基于EEG合成,数据,MOS评测以及语音合成的应用 1 1 0


文章列表:

1月

    类型
1 Supervised and Unsupervised Approaches for Controlling Narrow Lexical Focus in Sequence-to-Sequence Speech Synthesis am
2 Polyphone Disambiguition in Mandarin Chinese with Semi-Supervised Learning frontend
3 Generating coherent spontaneous speech and gesture from text multimodality
4 Creating Song From Lip and Tongue Videos With a Convolutional Vocoder multimodality
5 On Interfacing the Brain with Quantum Computers: An Approach to Listen to the Logic of the Mind other
6 Whispered and Lombard Neural Speech Synthesis expression
7 Expressive Neural Voice Cloning expression/

personalization

8 High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion vc
9 EmoCat: Language-agnostic Emotional Voice Conversion vc
10 Hierarchical disentangled representation learning for singing voice conversion vc
11 Adversarially learning disentangled speech representations for robust multi-factor voice conversion vc
12 Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss vocoder

2月

1 Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet am
2 Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis am
3 VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention am
4 Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input am

5

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech am
6 LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search am
7 Data-Efficient Training Strategies for Neural TTS Systemsmatch am
8 Model architectures to extrapolate emotional expressions in DNN-based text-to-speech expression
9 Model architectures to extrapolate emotional expressions in DNN-based text-to-speech expression
10 SPEAK WITH YOUR HANDS Using Continuous Hand Gestures to control Articulatory Speech Synthesizer modal
11 MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network other
12 Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning

personalization

13 Anyone GAN Sing sing
14 Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram vc
15 Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversion vc
16 Universal Neural Vocoding with Parallel WaveNet vocoder
17 LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation vocoder
18 High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion vocoder

3月

1 Multilingual Byte2Speech Text-To-Speech Models Are Few-shot Spoken Language Learners am
2 Text-to-speech for the hearing impaired am
3 Continual Speaker Adaptation for Text-to-Speech Synthesis am
4 Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling am
5 PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS am
6 Analysis and Assessment of Controllability of an Expressive Deep Learning-based TTS system expression
7 STYLER: Style Modeling with Rapidity and Robustness via SpeechDecomposition for Expressive and Controllable Neural Text to Speech expression
8 What is Multimodality? modal
9 CUHK-EE voice cloning system for ICASSP 2021 M2VoC challenge personal
10 Real-time Timbre Transfer and Sound Synthesis using DDSP personal
11 AdaSpeech: Adaptive Text to Speech for Custom Voice personal
12 Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech personal
13 A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music

personal/am

14 Latent Space Explorations of Singing Voice Synthesis using DDSP sing
15 Learning to Generate Music With Sentiment sing
16 crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder vc
17 MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames vc
18 Axial Residual Networks for CycleGAN-based Voice Conversion vc
19 IMPROVING ZERO-SHOT VOICE STYLE TRANSFER VIA DISENTANGLED REPRESENTATION LEARNING vc
20 GAN Vocoder: Multi-Resolution Discriminator Is All You Need vocoder
21 Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains vocoder
22 Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN vocoder

猜你喜欢

转载自blog.csdn.net/liyongqiang2420/article/details/115345063