DeepSinger: Singing Voice Synthesis with Data Mined From the Web - 代码天地

DeepSinger: Singing Voice Synthesis with Data Mined From the Web

其他 2021-12-14 18:16:26 阅读次数: 0

作者：任意，谭旭
会议：SIGKDD2020（数据挖掘和知识发现顶会）
时间：2020年7月

demo link
没有用到什么新的方法，只是数据完全是爬虫得到的，效果听起来还可以，但是模型很多细节值得推敲。

abstract

从网上抓取的音乐数据，做伴奏分离，force-align，然后使用transformer训练从歌词中生成声学线性谱，然后用GL声码器还原。
优点：（1）率先使用爬虫的数据训练；（2）force-align节省大量人力；（3）模型简单；并且使用一个ref-encoder从带噪数据中学到singer的音色；（3）训练数据包括89个singer共计92h的数据，普通话，粤语，英语三种语言。可以生成多种语言&多歌手的歌曲。

introduction

build一个歌词–singing的对齐工具，先做句子级别对齐，然后做phn级别的对齐；
多语言&多singer的歌唱合成系统，设计一个reference encoder，从 noisy singing中提取歌手音色，而不是使用singer IDl

3.2 Lyrics-to-Singing Alignment

首先用整首歌和歌词训练对齐模型，得到句子级别切分的wav和歌词；
对第一步的模型继续训练，得到phn级别对齐的模型；
训练时候使用了一些策略，但是没有太多的新意，也都是ASR中常用的方法。

3.3 Singing Modeling

使用FastSpeech的结构，分别预测歌词encoder，pitch encoder以及reference encoder
直接预测linear spec，然后用GL恢复
Lyrics Encoder：歌词phn查表，编码，扩帧；
Pitch Encoder：直接从train set中提取的pitch，看图5的意思是转成note 输入；
RefEncoder：输入linear spec，最后压缩掉时间维度。相比于spk_id编码的好处在于：RefEncoder是对一句话的编码，因此infer的时候使用干净的ref wav,编码就是干净的；而spk-id的编码是说话人和音质混合在一起。----------解耦信息有限，不是非常靠谱

猜你喜欢

转载自blog.csdn.net/qq_40168949/article/details/118399402

DeepSinger: Singing Voice Synthesis with Data Mined From the Web

SLT2021: LEARN2SING: TARGET SPEAKER SINGING VOICE SYNTHESIS BY LEARNING FROM A SINGING TEACHER

[2021icas][Tencent] Lite sing Towards Fast, Lightweight and Expressive Singing Voice Synthesis

ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic

SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING DEEP RECURRENT NEURAL NETWORKS论文翻译

SEMI-SUPERVISED LEARNING FOR SINGING SYNTHESIS TIMBRE

Unsupervised Cross-Domain Singing Voice Conversion

Self-Supervised Representations for Singing Voice Conversion

Learning the Beauty in Songs: Neural Singing Voice Beautifier

Mellotron：Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style

FASTSVC: FAST CROSS-DOMAIN SINGING VOICE CONVERSION WITH FEATURE-WISE LINEAR MODULATION论文理解

[2020 interspeech] DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion

synthesis

win7 C#程序调用 System.Speech.Synthesis 提示 No voice installed on the system

《OReilly.Web.Scraping.with.Python.Collecting.Data.from.the.Modern.Web》pdf

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs

An Improved StarGAN for EVC: Enhancing Voice Quality and Data Augmentation

Data from life.

Raw Data from Figures

A Singing Contest

Singing Everywhere

SparkStreaming pull data from Flume

data replication from different databases

pandas read data from files

spark load data from mysql

How is data replicted from HANA to AS

Find Median from Data Stream

parse data from Nacos error

S-Net阅读理解《S-Net: From Answer Extraction to Answer Synthesis for Machine Reading Comprehension》

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

基本数据类型封装类比较 Java源码解读(一) 8种基本类型对应的封装类型

JS实现无缝滚动上

深入解析HashMap原理（基于JDK1.8）

mysql的连接池

关于.htc

linux下的ubuntu12.04图形界面

【数论】好推不好记的扩展欧几里德

设备树详解

cscope + tags 简单设置

xml学习

每日归档

更多

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)