Connection timing classification: Algorithms to solve the misalignment of input and output in tasks such as text recognition and speech recognition

Connection timing classification: Algorithms to solve the misalignment of input and output in tasks such as text recognition and speech recognition

Connectionist Temporal Classification.

In tasks such as text recognition and speech recognition, the input and output may not be aligned, but are affected by different people's writing habits and speaking speed:

Connectionist Temporal Classification (CTC) is an algorithm suitable for this situation where the input and output are not known to be aligned.

For the convenience of description, the following definition is made, the input (such as audio signal) is represented by the symbol sequence $X=[x_1,x_2,...,x_T]$, and the corresponding output (such as the corresponding label text) is represented by the symbol sequence $Y= [y_1,y_2,...,y_U]$ means that in order to facilitate the training of these data, it is hoped to find the precise mapping relationship between the input $X$ and the output $Y$.

Input and output features:

  • Both $X$ and $Y$ are variable length;
  • The length ratio of $X$ and $Y$ also changes

Guess you like

Origin blog.csdn.net/universsky2015/article/details/131672313