OCR之R^2AM(Recursive Recurrent Nets with Attention Modeling for OCR in the Wild)

Write down some of the ideas I read the papers.

Recursive Recurrent Nets with Attention Modeling for OCR in the Wild, the model has three components, namely, recursive CNN, RNN (Recurrent neural net work), soft attention model. As shown below. recursive CNN Images for encoding (picture feature extraction), RNN model for the character level of language, attention focus on better use of picture feature.
At the same time this model is not based on dictionary.

Here Insert Picture Description
Recursive Recurrent Nets with Attention Modeling for OCR in the Wild, The model is a single word prediction cropped picture.

CNN layer

CNN There are three approaches to enhance the ability to predict the text up and down relationship:
The first is to use a large kernel size, or a deeper network, increase feelings of domain response.
The second is the use of recursive This;
the third is the use of recurrent.

This paper found that the use recursvie more performance, perhaps because recursive cnn recurrent terms compared to the error signal can prevent direct back-propagation.

Divided into recursive CNN in an initial inter-layer feed-forword weights and intra-layer recursive weights,

RNN layer

The two best factorization, a first layer RNN focused on character level model, RNN focused on the second joint statistical characteristics of speech and pictures. RNN effect is not better than factoring.

The model does not use LSTM, because this paper identify eight characters around character short, not very long character recognition. With the LSTM will not lift too much utility.

Attention modeling

Attention mechanism allows the model to focus on the most important section of the input feature.
Attention can be divided into hard attention and soft attention. hard attention mainly to learn a series of discrete position, and soft attention endtoend standard back-propagation training.
Attention RNN in the middle two layers.

Summary, this model does not use CTC, so only recognize individual words. But it can be combined with CTC in identifying characters long last, more than one word. This model CRNN and the like, CRNN is CNN + RNN (LSTM) + CTC.

Published 21 original articles · won praise 18 · views 1451

Guess you like

Origin blog.csdn.net/zephyr_wang/article/details/104770565