CTPN principle introduction

Connectionist Text Proposal Network (CTPN). This article is a reference to the original thesis I basically translated summary. Original thesis was "Detecting Text in Natural Image with Connectionist Text Proposal Network"

2.1.1. RPN compared with

RPN in terms of object recognition is good, but not in terms of the text, because objects do not need to be too precise identification currently able to determine the type of object, and character recognition need to be very precise.

2.1.2.CTPN features introduced

1. By identifying a series of small text box, scores were given in the text, non-text fraction, and then combine them together for prediction.
2.In-network cycling network directly applied to the picture characteristics convolution neural network extraction.
3. You can handle multiple characters in a picture while simultaneously handle multiple voice and text.
4. The speed, accuracy are high, 0.88 F-measure over 0.834 on the ICDAR 2013, and 0.61 F-measure over 0.54 in on the ICDAR 2015. Speed of 0.14s / image.

2.1.3. Methods previously identified

1. The text recognition, generally have the previous two, one is connected-components (CCs) based approaches , one is sliding-window based methods.
CCs taking only a few low-level parameters, such as brightness, color, gradient.
sliding-window based methods rely on a lot of different window sizes to get the text or text window, which in turn rely on pre-trained window identifier to identify.
2. Object Recognition: mainly the RPN to Fast R-CNN model is the most famous.

2.1.4.CTPN details

Here Insert Picture Description

1.16 fixed pixel width. Mullion with fixed width of 16 pixels to analyze, rather than identifying each character, as it causes confusion in the end part of a single character, or a combination of a plurality of characters. This method can be applied in different character sizes. VGG16 extracted features used, of course, other models can also be used. It focused on predicting the ordinate (y) of.
Here Insert Picture Description
2.BLSTM. RNN network identified by the circulation of each frame correlation of these 16 pixel width, so that the association between the recognized text. The following diagram can be seen the difference and there is no RNN RNN of. RNN paper using a bi-directional LSTM.
Here Insert Picture Description

Here Insert Picture Description
3. Optimization of the border. To pile 16 after all the pixel width of the stile combined to accurately determine the length of the text. Therefore, the prediction was added and the abscissa (x, width) aspect, to enhance the word boundary.

Here Insert Picture Description
Can be seen in FIG plus the optimization (red box) plus boundary and the boundary is not optimized (yellow phantom) differences.
Here Insert Picture Description

2.1.5. Loss function
L cl, L re and l re o, calculates the error representative character / non-character score, coordinate calculation error, the error boundary is calculated as three outputs (2k one character / non-character last frame of FIG CPTN fraction, 2k coordinates, k boundary)

Here Insert Picture Description

2.1.6.CTPN prepare training data description

I blog https://blog.csdn.net/zephyr_wang/article/details/104200499 describes the installation and operation of cptn. Prepare training data under the following to say.
1. Initial data
the following diagram, for example, image (img_1133) is this a picture, label (gt_img_1133) data marked red are my coordinates, x1, y1, x2, y2 , x3, y3, x4, y4 are 224,109,704,109,704,259,224,259.
Here Insert Picture Description

2. Generate a label cptn gt_img_1133 required data. Cptn pixel width 16 is needed.
Run python ./utils/prepare/split_label.py production label will eventually need.
Because I y1-y2 = 0, y3- y3 = 0, i.e. the same height, so just put this rectangle (x1, y1, x2, y2 , x3, y3, x4, y4) on x coordinates split into a pile 16 pixel width the anchor. The final label shown below.

Here Insert Picture Description
Here Insert Picture Description
3.split_label.py fact, the main call shrink_poly catalog utils to generate the final label, the following code is intercepted.

Here Insert Picture Description

Published 21 original articles · won praise 18 · views 1454

Guess you like

Origin blog.csdn.net/zephyr_wang/article/details/104332212
Recommended