ctc学习论文

History

ICML-2006. Graves et al. [1] introduced the connectionist temporal classification (CTC) objective function for phone recognition. 
ICML-2014. Graves [2] demonstrated that character-level speech transcription can be performed by a recurrent neural network with minimal preprocessing. 
Baidu. 2014 [3] DeepSpeech, 2015 [4] DeepSpeech2. 
ASRU-2015. YaJie Miao [5] presented Eesen framework. 
ASRU-2015. Google [6] extended the application of Context-Dependent (CD) LSTM trained with CTC and sMBR loss. 
ICASSP-2016. Google [7] presented a compact large vocabulary speech recognition system that can run efficiently on mobile devices, accurately and with low latency. 
NIPS-2016. Google [8] used whole words as acoustic units. 
2017, IBM [9] employed direct acoustics-to-word models.

Reference

[1]. A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber. Connectionist temporal classfification: labelling unsegmented sequence data with recurrent neural networks. In ICML, 2006. 
[2]. Graves, Alex and Jaitly, Navdeep. Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1764–1772, 2014. 
[3]. Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G.,Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates,A., et al. (2014a).Deepspeech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567. 
[4]. D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos et al., “Deep speech 2: End-to-end speech recognition in english and mandarin,” CoRR arXiv:1512.02595, 2015. 
[5]. Yajie Miao, Mohammad Gowayyed, Florian Metze. EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding. 2015 Automatic Speech Recognition and Understanding Workshop (ASRU 2015) 
[6]. A. Senior, H. Sak, F. de Chaumont Quitry, T. N. Sainath, and K. Rao, “Acoustic Modelling with CD-CTC-SMBR LSTM RNNS,” in ASRU, 2015 
[7]. I. McGraw, R. Prabhavalkar, R. Alvarez, M. Gonzalez Arenas, K. Rao, D. Rybach, O. Alsharif, H. Sak, A. Gruenstein, F. Beaufays, and C. Parada, “Personalized speech recognition on mobile devices,” in Proc. of ICASSP, 2016. 
[8]. H. Soltau, H. Liao, and H. Sak, “Neural speech recognizer: Acoustic-to-word lstm model for large vocabulary speech recognition,” arXiv preprint arXiv:1610.09975,2016. 

[9]. K. Audhkhasi, B. Ramabhadran, G. Saon, M. Picheny, D. Nahamoo, “Direct Acoustics-to-Word Models for English Conversational Speech Recognition” arXiv preprint arXiv:1703.07754,2017.


参考文献:https://blog.csdn.net/xmdxcsj/article/details/70300591

猜你喜欢

转载自blog.csdn.net/xwei1226/article/details/80396550
CTC