GAN (Generative Adversarial Network), as a generative model in the field of deep learning, has achieved good results in image, audio and other modal data in recent years. The core idea is to use the confrontation training method in game theory to

Author: Zen and the Art of Computer Programming

1 Introduction

GAN (Generative Adversarial Network), as a generative model in the field of deep learning, has achieved good results in image, audio and other modal data in recent years. Its core idea is to let two networks (a generative network G and a discriminative network D) compete with each other through the confrontation training method in game theory, and continuously improve the ability of self-distribution. In this paper, the author applies the generative confrontation network to the speech synthesis task, and builds a sequence-to-sequence model in the form of subword units to solve the problem of spoken language transcription.
As the main research direction in the field of artificial intelligence, NLP (Natural Language Processing) is one of the key technologies to realize the understanding and automatic processing of natural language. In the past few years, with the rise of various applications such as machine translation, text summarization, and automatic question answering systems, research work in NLP has developed rapidly again. For example, the GNMT (Google Neural Machine Translation) model launched by Google's news machine translation system Baidu Lab is a deep learning model based on neural networks, which can achieve amazing accuracy. There are many traditional word segmentation methods that have been proven to be effective and highly accurate. After the emergence of a new generation of unsupervised methods such as BERT and XLNet, word segmentation, an important basic function, has gradually become a research hotspot in NLP.
The focus of this paper is to apply generative adversarial networks to speech synthesis tasks, that is, to convert input Chinese character strings into corresponding pinyin phonemes. Subword units are an important concept in NLP, which can represent Chinese character strings into smaller fragments for easy modeling and processing. This article will practice the sequence-to-sequence (seq2seq) model based on the Transformer structure, and use the sub-word unit to construct the model to solve the problem of spoken language transcription.

2. Related work

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132013973