2021年语音合成论文统计（1~3月）

论文统计每月第一周更新一次，主要跟踪语音合成的发展状况(很多文章都是在会议后才发出，但不影响统计。统计过程难免存在疏漏，因此统计结果仅供参考。读者有什么建议可以直接向我发消息，我将不断修改该统计。历年文章统计可访问 http://yqli.tech/page/tts_paper.html）。如有转载，请注明出处。欢迎关注微信公众号：低调奋进。

语音合成文章情况表（单位：篇）

		1月	2月	3月
前端	多音字，韵律，g2p等等。	1	0	0
声学模型	语言特征转声学特征，attention工作以及双重学习	1	7	5
声码器	波形生成	1	3	3
个性化	少数据，脏数据应用等	1	1	5
多语言	多语言多说话人模型	0	0	0
歌唱合成	歌唱和音乐合成	0	1	2
情感	风格和情感	2	2	0
多模态	talking head等等	2	1	1
声音转换	基于GAN方案和特征解耦方案	4	2	4
其它	基于EEG合成，数据，MOS评测以及语音合成的应用	1	1	0

文章列表：

1月

		类型
1	Supervised and Unsupervised Approaches for Controlling Narrow Lexical Focus in Sequence-to-Sequence Speech Synthesis	am
2	Polyphone Disambiguition in Mandarin Chinese with Semi-Supervised Learning	frontend
3	Generating coherent spontaneous speech and gesture from text	multimodality
4	Creating Song From Lip and Tongue Videos With a Convolutional Vocoder	multimodality
5	On Interfacing the Brain with Quantum Computers: An Approach to Listen to the Logic of the Mind	other
6	Whispered and Lombard Neural Speech Synthesis	expression
7	Expressive Neural Voice Cloning	expression/ personalization
8	High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion	vc
9	EmoCat: Language-agnostic Emotional Voice Conversion	vc
10	Hierarchical disentangled representation learning for singing voice conversion	vc
11	Adversarially learning disentangled speech representations for robust multi-factor voice conversion	vc
12	Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss	vocoder

2月

1	Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet	am
2	Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis	am
3	VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention	am
4	Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input	am
5	Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech	am
6	LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search	am
7	Data-Efficient Training Strategies for Neural TTS Systemsmatch	am
8	Model architectures to extrapolate emotional expressions in DNN-based text-to-speech	expression
9	Model architectures to extrapolate emotional expressions in DNN-based text-to-speech	expression
10	SPEAK WITH YOUR HANDS Using Continuous Hand Gestures to control Articulatory Speech Synthesizer	modal
11	MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network	other
12	Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning	personalization
13	Anyone GAN Sing	sing
14	Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram	vc
15	Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversion	vc
16	Universal Neural Vocoding with Parallel WaveNet	vocoder
17	LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation	vocoder
18	High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion	vocoder

3月

1	Multilingual Byte2Speech Text-To-Speech Models Are Few-shot Spoken Language Learners	am
2	Text-to-speech for the hearing impaired	am
3	Continual Speaker Adaptation for Text-to-Speech Synthesis	am
4	Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling	am
5	PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS	am
6	Analysis and Assessment of Controllability of an Expressive Deep Learning-based TTS system	expression
7	STYLER: Style Modeling with Rapidity and Robustness via SpeechDecomposition for Expressive and Controllable Neural Text to Speech	expression
8	What is Multimodality?	modal
9	CUHK-EE voice cloning system for ICASSP 2021 M2VoC challenge	personal
10	Real-time Timbre Transfer and Sound Synthesis using DDSP	personal
11	AdaSpeech: Adaptive Text to Speech for Custom Voice	personal
12	Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech	personal
13	A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music	personal/am
14	Latent Space Explorations of Singing Voice Synthesis using DDSP	sing
15	Learning to Generate Music With Sentiment	sing
16	crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder	vc
17	MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames	vc
18	Axial Residual Networks for CycleGAN-based Voice Conversion	vc
19	IMPROVING ZERO-SHOT VOICE STYLE TRANSFER VIA DISENTANGLED REPRESENTATION LEARNING	vc
20	GAN Vocoder: Multi-Resolution Discriminator Is All You Need	vocoder
21	Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains	vocoder
22	Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN	vocoder

2021年语音合成论文统计（1~3月）

personalization

personalization

猜你喜欢