Forward: reading list | voice Advanced Study Guide

Reference Links: https://www.msra.cn/zh-cn/news/features/book-recommendation-speech

Of books | voice Advanced Study Guide

2019-03-22  | Author: Wang Xi

We invite the classic bibliography Microsoft (Asia) Internet Academy of Engineering, a senior speech scientist Wang Xi recommended for everyone voice field, covering various aspects of signal processing voice studies, speech and language processing, deep learning research, which contains a large number of voice basic knowledge of technology, people are concerned about the depth at the same time learning algorithms and models, the accumulation of basic knowledge, concepts, and ideas for in-depth understanding of the field is very meaningful, but also to remind you that due to the very rapid development in the field of voice, so upon reading books at the same time, concerns related fields newest, most influential article is also very important.

Here, take a look at this book orders it.

Signal Processing articles

1. Discrete-Time Signal Processing (3rd version)

Chinese version: "Discrete-Time Signal Processing (3rd Edition)"

Author: AV Oppenheim, Ronald W. Schafer

For the crowd: Beginner to Intermediate (signals and systems needed basis)

Recommendation: ★★★★

Summary: This book RW Massachusetts Institute of Technology and Georgia Tech professor AV Oppenheim Schafer wrote together, systematically discusses the basic theory and methods of discrete-time signal processing, is a classic textbook international authority in the field of signal processing. Including discrete-time signals and systems, z conversion on the sampling of continuous-time signal, linear time-invariant system analysis, computing a discrete-time system architecture, filter design method, a discrete Fourier transform, discrete Fourier transform, signal using the discrete Fourier transform of a Fourier analysis, modeling parameter signal, the discrete Hilbert transform, cepstrum analysis and the same convolution state solutions.

Recommended reason: This book is a classic of DSP textbooks, it is to master the basic theory and discrete signal processing methods, as well as speech signal processing, analysis and theoretical basis for transformation and application support. This book not only as a communication and signal processing areas of undergraduate and graduate teaching, can be used as authoritative reference in the speech signal processing related technology researchers.

2. Discrete-Time Speech Signal Processing: Principles and Practice

Chinese version: "Discrete-Time Speech Signal Processing: Principles and Applications"

Author: Thomas F. Quatieri

For the crowd: Intermediate (signals and systems needed basis)

Recommendation: ★★★★★

Summary: This book finishing professor from the Massachusetts Institute of TF Quatieri "Digital Voice Processing", describes the main principles and important applications of speech signal processing, between theory and application to achieve a good balance. The book first describes the theoretical basis for a complete understanding of discrete-time speech signal processing, and then introduces the important progress in terms of study of speech signal processing, speech processing, including sine, frequency analysis and nonlinear acoustics voice during speech production model, and in-depth It describes the relevant applications, including speech coding, speech enhancement, speech synthesis, speaker recognition.

Recommended reason: The author conducted years of research and development-related projects at the Massachusetts Institute of Technology Lincoln Laboratory voice, has accumulated considerable experience. The book includes almost all speech signal processing theory and applications: Theoretical Basis forth from voice and perceived acoustic theory and analysis to model all-pole model from the analysis by synthesis homomorphic signal processing, Fourier transform, filter banks and sinusoidal analysis and synthesis pitch frequency estimate, and a very important part of the application. This book will help the reader understand very comprehensive and solid foundation, recommended as speech processing application and study reference books.

Speech and Language Processing

1. Fundamentals of Speech Recognition

Author: LR Rabiner, BH Juang

For the crowd: Beginner to Intermediate (basic needs, including signal processing, speech, physiology, statistics, mathematics, etc.)

Recommendation: ★★★★

主要内容:本书是两位语音领域的泰斗——前贝尔实验室主任L. Rabiner教授和美国国家工程院院士庄炳煌教授合著,完整论述了现代语音识别的基本问题和思想,包括语音信号产生、感知和语音信号的声学以及语音学特征、语音识别的信号处理和分析方法、模式比较、以及语音识别系统的设计和实现。其中详细介绍了隐马尔可夫模型理论和实现、孤立词/连接词模型、大词汇连续语音识别、特定任务语音识别等。

推荐理由:本书语言流畅,对语音识别的基本问题阐述精辟而全面,对深入理解语音技术非常有帮助。Rabiner是HMM三个问题论述的作者,所以本书对HMM进行了鞭辟入里而又详细易懂的诠释,并提供了大量实例。本书对语音感知、转换,矢量量化和动态规划的介绍也非常经典。适用于对语音识别感兴趣的工程师、科学家、语言学家和研究员。

2. Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

作者:黄学东、Alex Acero、洪小文

推荐指数:★★★★★

适合人群:初级到中级

主要内容: 本书对口语处理中所涉及的理论和实践问题进行了全面的论述。口语处理包含声学、音韵、语音、语言、语用、话语等多样多层次的知识,涉及到计算机科学、电子工程、数学、语法和心理学等多领域,其应用包括语音识别、语音合成和口语理解。本书系统介绍上述应用所需要的理论基础(包括概率统计、信息论、模式识别、语音信号处理、语音特征表达、语音编码),然后从实践角度详细介绍了语音识别系统(包括声学模型、环境鲁棒性、语言模型、搜索算法尤其是大词汇搜索算法、包含数据准备和词典的语音合成技术 、结构化特征、文本归一化、韵律、合成方法),最后还介绍了口语理解的相关内容。本书涵盖了口语处理中的基本理论以及需要解决的实际问题。

推荐理由: 本书作者累计了从学术界到工业界大量的学术知识和实践经验,内容非常详实和实用,几乎涵盖了口语语音应用领域绝大部分经典的概念和技术模块。即使是到了深度神经网络技术成为主流的今天,读者仍然可以通过该书加深对各个技术模块的理解,例如如何对声学和语言模型构造搜索空间和搜索算法。由于端到端技术的兴起,书中所介绍的部分模块可能会被取代,但是对于深入理解语音技术的概念和问题依然具有启发性。

深度学习进阶篇

1. Automatic Speech Recognition: A Deep Learning Approach

中文版: 《解析深度学习:语音识别实践》

作者:俞栋、邓力

适用人群:中级(需要一定的机器学习或语音识别基础)

推荐指数:★★★★

主要内容:本书是首部介绍语音识别中深度学习技术细节的专著 。全书首先概要介绍了传统语音识别理论和经典深度神经网络核心算法,接着全面深入地介绍了深度学习在语音识别中的应用,包括“深度神经网络-隐马尔可夫混合模型”的训练和优化、特征表示学习、模型融合、自适应、以及以递归神经网络为代表的若干先进深度学习技术。本书对所有的算法及技术细节都提供了详尽的参考文献,描绘了深度学习在语音识别中应用的全景。

推荐理由:本书作者俞栋和邓力都是将深度学习技术与传统语音识别技术相结合,并在应用领域取得突破进展的推动者和实践者,该书亦是为数不多的介绍深度学习在语音识别应用领域的著作。通过阅读本书,读者可以全面了解近年来将深度学习引入语音识别领域的背景、发展过程、理论依据、关键技术和思维方式。本书适合有一定机器学习或语音识别基础的学生 、研究者或从业者。

2. Automatic Speech Recognition

课程导师:Steve Renals、Hiroshi Shimodaira

课程链接:http://www.inf.ed.ac.uk/teaching/courses/asr/lectures-2019.html

适用人群:中级(需要一定的机器学习、信号处理、语音学基础)

推荐指数:★★★★★

主要内容:本课程是英国爱丁堡大学最新语音识别课程,内容包括背景理论介绍、语音信号分析、HMM声学模型、基于神经网络的声学模型以及相关技术(包括解码、对齐和加权有限状态机、区分性训练、说话人识别和多语言识别等)。该课程对语音识别领域的最新进展和相关经典论文亦有介绍,课程作业还包括用Kaldi工具构建识别系统,有助于学习者获得实践经验。

推荐理由:该课程内容非常系统,并且包括了许多较新的技术进展。通过学习该课程,读者能够对当下语音识别领域有较为全面和深入的了解。

3. Deep Learning for Computer Vision, Speech, and Language

课程导师:Liangliang Cao、Xiaodong Cui、Kapil Thadani

课程链接:http://llcao.net/cu-deeplearning17/schedule.html

适用人群:中级(需要一定的机器学习、图像、语音和自然语言处理基础)

推荐指数:★★★★★

主要内容:本课程是美国哥伦比亚大学的课程,内容涉及目前深度学习技术最热门和成功的三个领域:计算机视觉、语音和语言。课程侧重于各种模型介绍和相关领域的最新进展介绍,并且包含较多的开源工具(例如Keras和Theano)实践和大量的经典论文解读。课程还涉及了相关数学和神经网络基础、深度学习的语音识别、端到端语音识别、语言表达和语言模型、图像识别、Poker AI 和语音合成模型WaveNet等内容。

推荐理由:本课程内容横跨深度学习三大代表应用领域:计算机视觉、语音和自然语言处理,并且通过介绍最新和具有代表性的模型在各自领域的应用,以加深和融汇深度学习技术对于AI的影响。近年来,随着深度学习技术的发展,这三大领域的交叉和关联日益增多,语音作为从信号到语言的复合载体,通过学习视觉和自然语言应用,能够更好地促进语音领域的学习。

				</article>

Guess you like

Origin blog.csdn.net/qq_26369907/article/details/90287670