基于DFSMN-CTC及CTC-CE联合训练的声学模型

参考文献:

[1] A. R. Mohamed, G. E. Dahl, and G. Hinton, “Acoustic modeling using deep belief networks,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 1, pp. 14–22, 2012.

[2] G. E. Dahl, D. Yu, L. Deng, and A. Acero, “Context-dependent pre-trained deep neural networks for large vocabulary speech recognition,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 1, pp. 30–42, 2012.

[3] O. Abdel-Hamid, A. R. Mohamed, H. Jiang, and G. Penn, “Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 2012, pp. 4277–4280.

[4] O. Abdel Hamid, A. R. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu, “Convolutional neural networks for speech recognition,” Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 22, no. 10, pp. 1533–1545, 2014.

[5] J. L. Elman, “Finding structure in time,” Cognitive science, vol. 14, no. 2, pp. 179–211, 1990.

[6] A. Graves, A. Mohamed, and G. E. Hinton, “Speech recognition with deep recurrent neural networks,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 6645–6649.

[7] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.

扫描二维码关注公众号,回复: 6227313 查看本文章

[8] K. Vesel` y, A. Ghoshal, L. Burget, and D. Povey, “Sequence discriminative training of deep neural networks.” in Interspeech, 2013, pp. 2345–2349.

[9] A. Graves and N. Jaitly, “Towards end-to-end speech recognition with recurrent neural networks,” in International Conference on Machine Learning, 2014, pp. 1764–1772.

[10] A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates et al., “Deep speech: Scaling up end-to-end speech recognition,” arXiv preprint arXiv:1412.5567, 2014.

[11] H. Sak, A. Senior, K. Rao, O. Irsoy, A. Graves, F. Beaufays, and J. Schalkwyk, “Learning acoustic frame labeling for speech recognition with recurrent neural networks,” in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 4280–4284.

[12] H. Sak, A. Senior, K. Rao, and F. Beaufays, “Fast and accurate recurrent neural network acoustic models for speech recognition,” arXiv preprint arXiv:1507.06947, 2015.

[13] Y. Miao, M. Gowayyed, and F. Metze, “Eesen: End-to-end speech recognition using deep rnn models and wfst-based decoding,” in Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. IEEE, 2015, pp. 167–174.

[14] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd international conference on Machine learning. ACM,2006, pp. 369–376.

[15] D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen et al., “Deep speech 2: End-to-end speech recognition in english and mandarin,” in International Conference on Machine Learning, 2016, pp. 173–182.

[16] J. Chorowski, D. Bahdanau, K. Cho, and Y. Bengio, “End-to-end continuous speech recognition using attention-based recurrent nn:

First results,” arXiv preprint arXiv:1412.1602, 2014.

[17] J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, “Attention-based models for speech recognition,” in Advances in neural information processing systems, 2015, pp. 577–585.

[18] D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio, “End-to-end attention-based large vocabulary speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 4945-4949.

[19] W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016,pp. 4960–4964.

[20] C.-C. Chiu, T. N. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen, A. Kannan, R. J. Weiss, K. Rao, K. Gonina et al., “State-of-the-art speech recognition with sequence-to-sequence models,” arXiv preprint arXiv:1712.01769, 2017.

[21] Y. Zhang, G. Chen, D. Yu, K. Yaco, S. Khudanpur, and J. Glass,“Highway long short-term memory rnns for distant speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP),

2016 IEEE International Conference on. IEEE, 2016, pp. 5755-5759.

[22] S. Xue and Z. Yan, “Improving latency-controlled blstm acoustic models for online speech recognition,” in Acoustics, Speech

and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017, pp. 5340–5344.

[23] V. Peddinti, D. Povey, and S. Khudanpur, “A time delay neural network architecture for efficient modeling of long temporal contexts,” in Proceedings of Interspeech, 2015.

[24] V.Peddinti, G.Chen, D.Povey, andS.Khudanpur, “Reverberation robust acoustic modeling using i-vectors with time delay neural networks,” Proceedings of Interspeech, 2015.

[25] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. J. Lang, “Phoneme recognition using time-delay neural networks,” Acoustics, Speech and Signal Processing, IEEE Transactions on,vol. 37, no. 3, pp. 328–339, 1989.

[26] T. Sercu, C. Puhrsch, B. Kingsbury, and Y. LeCun, “Very deep multilingual convolutional neural networks for lvcsr,” in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 4955–4959.

[27] S. Zhang, H. Jiang, S. Xiong, S. Wei, and L. Dai, “Compact feedforward sequential memory networks for large vocabulary continuous speech recognition.” in INTERSPEECH, 2016, pp. 3389-3393.

[28] S. Zhang, C. Liu, H. Jiang, S. Wei, L. Dai, and Y. Hu, “Nonre current neural structure for long-term dependence,” IEEE/ACM Transactions on Audio, Speech, and Language Processing,vol. 25, no. 4, pp. 871–884, 2017.

[29] S. Zhang, M. Lei, and a. D. L. Yan, Zhijie, “Deep-fsmn for large vocabulary continuous speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2018 IEEE International Conference on. IEEE, 2018, pp. 58 639–587.

[30] H. Sak, F. de Chaumont Quitry, T. Sainath, K. Rao et al., “Acoustic modelling with cd-ctc-smbr lstm rnns,” in Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. IEEE, 2015, pp. 604–609.

[31] K.ChenandQ.Huo, “Scalabletrainingofdeeplearningmachines by incremental block training with intra-block parallel optimization and blockwise model-update filtering,” in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 5880–5884.

猜你喜欢

转载自blog.csdn.net/jiangyupu/article/details/88747631
CTC