Non-Autoregressive Neural Text-to-Speech - 代码天地

Non-Autoregressive Neural Text-to-Speech

其他 2021-12-14 18:16:21 阅读次数: 0

文章目录

作者：Kainan Peng∗ 1 Wei Ping
单位：百度美国研究院
会议：2020 ICML

abstract

提出一个非自回归的ParaNet（全卷积）完成TTS ，相比deep voice3提速46.7x，生成语音质量相当。通过不断改善layer-by-layer模式的attention实现对齐。用基于VAE结构的方法训练基于inverse autoregressive flow (IAF) 的parallel vocoder，从而实现一次前向完成TTS整个过程。

1. introduction

本文的主要贡献在于：

提出ParaNet，非自回归的TTS模型，text-to-speech，全卷积的结构，实现加速；
ParaNet从自回归的TTS结构中提取attention，然后通过layer-by-layer manner迭代的重新定义对齐，稳定性比deep voice3更好，因为没有teacher-force training和自回归推理的不匹配问题。
实用一个parallel neural vocoder，实现整个过程的并行化。parallel neural vocoder包括 distilled IAF vocoder和Wave- Glow ，本文想要实现training IAF vocoder without distillation，提出WaveVAE的方案，可以通过VAE从头开始训练而不是像wavenet还需要蒸馏。

3. Text-to-spectrogram model

3.2. Non-autoregressive architecture

Non-autoregressive decoder：没有自回归生成的限制，decoder中移除了因果卷积，可以利用未来的信息用于log-mel spec的生成。添加L1 loss预测log-linear spec。移除开头1*1 的卷积，因为decoder中没有自回归，不再输入log-mel spec。
No converter：非自回归的结构移除了DV3中的non-causal converter。DV3 中使用non-causal converter的主要原因是：根据非因果卷积提供的双向上下文信息改进解码器预测。

3.3. Parallel attention mechanism

之前TTS中有效的对齐，比如location sensitive attention等，都是基于自回归的对齐，需要基于之前的decoder step 计算累计误差；

猜你喜欢

转载自blog.csdn.net/qq_40168949/article/details/118737794

Non-Autoregressive Neural Text-to-Speech

SLT2021: LIGHTSPEECH: LIGHTWEIGHT NON-AUTOREGRESSIVE MULTI-SPEAKER TEXT-TO-SPEECH

Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation翻译

【论文学习笔记】《Deep Voice: Real-time Neural Text-to-Speech》

语音合成论文优选: A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music

语音合成论文优选：Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-

深度学习神经网络学习笔记-多模态方向-11-Deep Voice: Real-time Neural Text-to-Speech

The Neural Autoregressive Distribution Estimator

非自回归(Non-autoregressive，NAR)模型 / 自回归(autoregressive，AR)模型

Bag of Tricks for Unsupervised Text-to-Speech

论文翻译：Take a NAP: Non-Autoregressive Prediction for Pedestrian Trajectories（行人轨迹预测2020）

Non-Autoregressive Grammatical Error Correction Toward a Writing Support System翻译

语音合成论文优选：BVAE-TTS BVAE for Non-Autoregressive TTS

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

每日一练【浏览器text-to-speech】

【Flutter】flutter_tts 语音播放文本 Text-to-Speech

TTS（Text-to-Speech）文本转语音技术导论

Neural Speech Synthesis with Transformer Network

Autoregressive Convolutional Neural Networks for Asynchronous Time Series

NAOMI: Non-Autoregressive MultiresolutionSequence Imputation（非自回归多分辨率序列填补）论文详解

Re 40：读论文 GL-GIN: Fast and Accurate Non-Autoregressive Model for Joint Multiple Intent Detection and

语音合成论文优选：AutoML优化TTSLightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Non-local Neural Networks

SLT2021: MULTI-BAND MELGAN: FASTERWAVEFORM GENERATION FOR HIGH-QUALITY TEXT-TO-SPEECH

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech 论文理解

语音合成论文优选：Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guide

语音合成论文优选：通用tts系统Towards Universal Text-to-Speech

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis翻译（不含实验部分）

今日推荐

Linus “吃狗粮”最积极！

开源日报 | Winamp播放器即将开源；生成式AI之战升级第二轮；Linus“吃狗粮”最积极；AI进入泡沫前期；吴泳铭为阿里云带来了什么？

NetBSD 禁止提交由 AI 生成的代码

Apache Doris 2.0.10 版本正式发布！

开源日报 | 大模型开战；大模型独角兽被曝卖身；周鸿祎建议谷歌开源所有产品；最大开源AI社区提供1000万美元共享GPU

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

周排行

SVN服务端安装在阿里云

实战 | 相机标定

webpack核心概念

note20——》只要肯低头吃苦，人生就会有救

PAT甲级 1062 Talent and Virtue （25 分）排序

NG Toolset开发笔记--5GNR Resource Grid（26）

如何对待上司

oracle命令

第9章 STL迭代器

logstash使用es映射模板

每日归档

更多

2024-05-20(36)

2024-05-19(0)

2024-05-18(4)

2024-05-17(34)

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)