ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic - 代码天地

ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic

其他 2021-12-14 18:16:25 阅读次数: 0

文章目录

作者：Yu Gu, Xiang Yin
会议：2021 ISCSLP
单位：字节 AI LAB

abstract

使用tacotron-like的encoder-decoder结构作为歌唱合成的声学模型，WaveRNN作为vocoder，另外有一个辅助的duration预测结构用于输入序列的扩帧。

The proposed system

2.1 Overview

duration model：输入文本和musical info，预测phn的时长，再用note duration的限制做后处理；
声学模型：扩帧后的特征作为输入，预测80-d MEL，decoder是自回归的结构；

2.2 Feature representation

duration的输入是XD = [Ph, Tp, Du]，phn level的信息，Du是note理论的时长；
声学模型的输入是XA = [Ph, Pi, Po] ，frame level的信息，Po是3维位置信息，当前帧在phn中走过&剩余的百分比，当前phn在句子中的位置，然后都norm到浮点数。

2.3. Duration models

音符的起始时间和歌手演唱的起始时间是有偏差的，但是ByteSing实现中忽略了这一时间差；并且为了后续混音方便，将syllabel的时长限制为与note duration一致；事实证明只保留音节中元音和辅音比例的自由度并不影响真实的听感。

2.4. Acoustic models

在这里插入图片描述

因为encoder已经是扩帧之后的，因此attention很容易收敛并学到单调对齐

Experiments

train set：90首歌，来自同一个female singer
test set：训练集合外的10首歌

在这里插入图片描述
主观评测：直接对比了录制歌曲和合成歌曲，让受测者1-5打分，没有对比实验。

猜你喜欢

转载自blog.csdn.net/qq_40168949/article/details/105794559

ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic

DeepSinger: Singing Voice Synthesis with Data Mined From the Web

SLT2021: LEARN2SING: TARGET SPEAKER SINGING VOICE SYNTHESIS BY LEARNING FROM A SINGING TEACHER

[2020 interspeech] DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion

[2021icas][Tencent] Lite sing Towards Fast, Lightweight and Expressive Singing Voice Synthesis

SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING DEEP RECURRENT NEURAL NETWORKS论文翻译

win7 C#程序调用 System.Speech.Synthesis 提示 No voice installed on the system

SEMI-SUPERVISED LEARNING FOR SINGING SYNTHESIS TIMBRE

Unsupervised Cross-Domain Singing Voice Conversion

Self-Supervised Representations for Singing Voice Conversion

Learning the Beauty in Songs: Neural Singing Voice Beautifier

Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Sy

Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Skip Connections

2019trans--Sequence-to-Sequence Acoustic Modeling for Voice Conversion

降噪、超分辨率RED-Net之Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetr

Encoder-Decoder模型

DURIAN: DURATION INFORMED ATTENTION NETWORK FOR MULTIMODAL SYNTHESIS 论文理解

Encoder-Decoder 架构实现

Encoder-Decoder 预训练

ChatGPT 的结构：Encoder-Decoder

Mellotron：Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style

FASTSVC: FAST CROSS-DOMAIN SINGING VOICE CONVERSION WITH FEATURE-WISE LINEAR MODULATION论文理解

Causal decoder、Prefix decoder和encoder-decoder

Encoder-Decoder模型和Attention模型

Tips for Generation in Encoder-Decoder model

Encoder-Decoder(有待编写）

Encoder-Decoder综述理解(推荐)

RNN成长记(三)：Encoder-Decoder

Encoder-Decoder 模型架构详解

Using TCPDump on Ubuntu System

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

基本数据类型封装类比较 Java源码解读(一) 8种基本类型对应的封装类型

JS实现无缝滚动上

深入解析HashMap原理（基于JDK1.8）

mysql的连接池

关于.htc

linux下的ubuntu12.04图形界面

【数论】好推不好记的扩展欧几里德

设备树详解

cscope + tags 简单设置

xml学习

每日归档

更多

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)