CBHG 模块来自TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS - 代码天地

CBHG 模块来自TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS

其他 2018-11-20 03:18:42 阅读次数: 0

作者的灵感来源于在文章Fully Character-Level Neural Machine Translation without Explicit Segmentation中的模型。原型如下图所示：

CBHG模块如下图所示。首次提出在Goggle的一篇文章：TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS

回到CBHG模块，该模块善于提取序列特征。下面为模块步骤：

1.输入序列，先经过K个1-D卷积，第K个卷积核（filter）通道为k。这些卷积核可以对当前以及上下文信息有效建模。

2.卷积输出被堆叠（stack）一起，沿着时间轴最大池化（maxpooling）以增加当前信息不变性。stride取为1维持时间分辨率。

3.然后输入到几个固定宽度的1-D卷积，将输出增加到起始的输入序列（参考ResNet连接方式）。所有的卷积都采用Batch Normalization.

4.输入多层的highway 网络。用以提取更高级别的特征。highway网络可以参考https://blog.csdn.net/l494926429/article/details/51737883

5.最后在顶部加入双向GRU，用于提取序列的上下文特征。

与原文中的不同是加入了batch normlization,残差连接以及stride=1的最大池化，表现比原文更好。

猜你喜欢

转载自blog.csdn.net/yang_daxia/article/details/83897119

CBHG 模块来自TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS

Tacotron: Towards End-to-End Speech Synthesis

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

Towards End-to-end

GST--Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Towards End-to-End Lane Detection: an Instance Segmentation Approach

Speech Synthesis

（ICASSP 19）Streaming End-to-end Speech Recognition for Mobile Devices

ICCV 2017 《Towards End-to-End Text Spotting with Convolutional Recurrent Neural Network》论文笔记

DeepVO: Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks

车牌识别--Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline

车牌识别--Towards End-to-End License Plate Detection and Recognition: 提供强大的数据集

【转】SEE: Towards Semi-Supervised End-to-End Scene Text Recognition

论文笔记|Towards End-to-End Lane Detection: an Instance Segmentation

《Towards End-to-End Lane Detection: an Instance Segmentation Approach》论文阅读之LaneNet + H-Net

Retrieve, Read, Rerank: Towards End-to-End Multi-Document Reading Comprehension

文献阅读记录：Towards End-to-End Lane Detection: an Instance Segmentation Approach

[OneNet]OneNet: Towards End-to-End One-Stage Object Detection笔记

论文阅读《DeepVO: Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Net》

《Towards End-to-End Lane Detection: an Instance Segmentation Approach》论文阅读

李宏毅DLHLP.12.Speech Synthesis.1/2.Tacotron

TRANSFORMER-TRANSDUCER:END-TO-END SPEECH RECOGNITION WITH SELF-ATTENTION

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short

Towards End-to-End Lane Detection: an Instance Segmentation Approach Lanenet 车道线检测网络模型学习（论文解读）

Neural Speech Synthesis with Transformer Network

Speech Synthesis(文字转语音)

End-to-end Learning

Uncovering Latent Style Factors for Expressive Speech Synthesis

今日推荐

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

《2024 年一季度互联网投融资运行情况》研究报告

报告：Django 仍然是 74% 开发者的首选

周排行

返回指定时间格式

fopen函数中的mode参数

Java 单例模式探讨

Flex remoteobject工作原理探讨

寻找mplayer的便捷安装方法

30天了解30种技术系列---(26)MySQL自动化运维工具Inception

关于Jboss/Tomcat/Jetty的JNDI定义123

程序减肥，strip，eu-strip 及其符号表

AsyncTask、View.post(Runnable)、ViewTreeObserver三种方式总结frame animation自动启动

Json和Bean的互相转换

每日归档

更多

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)