"Connected Temporal Classification of Unlabeled Sequence Data Using Recurrent Neural Networks" paper reading

https://blog.csdn.net/u011239443/article/details/79973269

Paper address:
http://people.idsia.ch/~santiago/papers/icml2006.pdf

Summary

Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, acoustic signals are transcribed into text. Recurrent Neural Networks (RNNs) are powerful sequential learning models that seem to be well suited for such tasks. However, their applicability has been limited so far because they require pre-segmented training data and post-processing to convert their output into a sequence of labels. In this paper, we propose a novel method for training RNNs with direct non-segmented sequence labels to solve this problem. Its superiority over baseline HMM and hybrid hmm-rnn is shown on the TIMIT corpus.

introduce

Labeling unsegmented sequence data is a pervasive problem in real-world sequence learning. Especially common in perception tasks (e.g. handwriting recognition, speech recognition, gesture recognition), a noisy, real-valued input stream is annotated with discrete strings of labels, such as letters or words.

Currently, graphical models, such as Hidden Markov Models, Conditional Random Fields, and their variants, are the dominant frameworks for sequence labeling. While these methods have proven successful for many problems, they have several drawbacks: (1) they usually require a lot of knowledge of specific tasks, such as the design of state models for HMMs, or the selection of input features for CRFs; (2) they require Explicit (and often ambiguous) assumptions of dependencies make inference tractable, e.g., assuming that observation models are independent; (3) for standard HMM models, training is generative, even if sequence labels are discriminative.

On the other hand, Recurrent Neural Networks (RNNs) do not require any prior knowledge of the data and do not require a choice between input and output forms. They can be discriminatively trained, and their internal state provides a powerful general mechanism for modeling time series. Furthermore, they tend to be very robust to temporal and spatial noise.

So far, however, it has not been possible to use the direct sequence tagging type. The problem is that a standard neural network objective function is defined separately for each point in the training sequence; in other words, it can only be trained to do a series of independent label classifications. This means that the training data must be pre-segmented and the network output must be post-processed to give the final sequence of labels.

Currently, the most effective use of sequence labels for RNNs is the so-called hybrid approach that combines them with hidden Markov models. Hybrid methods use HMM models for the long-range data sequence structure, with neural networks to provide localized classification. The HMM component is able to automatically segment sequences during training and convert network classifications to label sequences. However, inheriting the above-mentioned shortcomings of HMM models, hybrid methods cannot realize the full potential of sequence modeling RNNs.

This paper proposes a new approach to RNNs that does not require pre-segmentation of training data, no need for post-processing output sequence data, and no need for labeling in a single network architecture model. The basic idea is to interpret the network output as a probability distribution over all possible label sequences. Given this distribution, the objective function can directly maximize the probability that the label is correct. Since the objective function is differentiable, the network can be trained by backpropagation.

Next, we call the task of labeling unsegmented data sequences "temporal classification", and we call the RNNs used for this "Concatenated Temporal Classification Models (CTCs)". We refer to the independent labeling of each time step or frame of the input sequence as "frame classification".

The next section provides the mathematical form of temporal classification and defines the error measure used in this paper. Section 3 describes the temporal classification model of RNNs. Section 4 explains how the CTC network is trained. Section 5 compares CTC Hybrid and HMM in the TIMIT corpus system. Section 6 discusses some key differences between CTC and other temporal classifiers, points the way for future work, and concludes in Section 7.

time classifier

This section is mainly to describe the function and evaluation method of the time classifier, the data S Each sample is ( x , with ) right. Taking speech recognition as an example here, then x are the phonetic features, with is the recognized text.

The evaluation method is:

in h for the model, S for the test set, WITH is the size of the test set, E D is calculating h ( x ) and with edit distance.

Connection time classification

This section describes the output representations that allow recurrent neural networks to be used for CTC. The key step is to convert the network output into a conditional probability distribution over the sequence of labels. The network can then classify by choosing the most likely token for a given input sequence.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324510074&siteId=291194637