OCR text detection model -CTPN

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/wsp_1138886114/article/details/100041204

A, CTPN Profile

A simple character recognition process is as follows:
Here Insert Picture Description

Step 1. collecting apparatus comprising a mobile phone by a camera, an image scanner or the like of the character to be recognized as an input;
STEP 2. scaling the image size, brightness adjustment, noise removal and other preprocessing operations;
STEP 3. single image a character area, or where several characters are continuously detected;
the Step 4. the text segmentation region from the detection result of the image where the text will come out, and then introduced into the model for text recognition, and further to obtain the image character information.

CTPN to the text in natural scenes, a good printed text detected.

  1. Feature Writing distribution **
    In understanding the text detection, Feature Writing distributed first look. Whether text in the printed text, or a natural scene, the text is generally horizontally aligned variable length consecutive characters, but substantially the same height. Since the width is variable, uncertain, then detected in accordance with a fixed height, to see the image in which the area is a consecutive area of the same height characteristics, and an edge line with the characteristics of the text, which will ring out .

  2. Timed Colored Petri Net
    Timed Colored Petri Net, name is "Detecting Text in Natural Image with Connectionist Text Proposal Network" ( text-based network connection box preselected detection). The model picture is accurately locate lines of text, the basic approach is text proposals feature map obtained directly produced on convolution (FIG characteristic) appropriate size range (block preselected) detecting the text line . The detected best seen in FIG idea of the model (note: Timed Colored Petri Net model is actually generated on the feature map proposals, instead of generating the image on the original, but the following schematic diagram):
    Here Insert Picture Description

  3. CTPN technical principle
    CTPN model utilizes seamless integration RNN and CNN to improve the detection accuracy. Wherein, CNN used to extract the depth of features, feature recognition sequence for RNN, seamless integration of both, with better performance on the test. among them:

    • CNN (using VGG16)
      Timed Colored Petri Net model generates a series Proposals (preselected block) is detected by using a feature VGG16 convolution output map (FIG feature).

    • RNN
      Since character information is a sequence of 'part of the character, character, multi-character "composition, so that the text is not a detection target independent, closed, but the associated front and rear, so using RNN (Recurrent Neural in the CTPN networks, recurrent neural network) using the prediction position information of the text before and after the text.

    • CTPN network structure model is shown below:
      Here Insert Picture Description

Two, CTPN detection process

The whole process is divided into six steps:

  • Step 1: Enter (h) × 900 (w) Image 3 × 600, using VGG16 feature extraction, feature obtained conv5_3 (VGG fifth block convolution third layer) as a feature map, the size of 512 × 38 × 57;

  • Step two: the sliding window in doing this feature map, the window size is 3 × 3, i.e., 512 × 38 × 57 becomes 4608 × 38 × 57 (512 expanded by 3 × 3 convolution);

  • The third step: all windows corresponding to each row of the input to the feature RNN (BLSTM, bidirectional LSTM), LSTM each layer is a hidden layer 128, i.e., 57 × 38 × 4608 becomes 57 × 38 × 128, Reverse- the same is obtained LSTM 57 × 38 × 128, the combined result is finally obtained 256 × 38 × 57;

  • Fourth step: The result is input to the RNN layer FC (full connection layer), layer FC is a 256 × 512 matrix parameters to obtain the results of 512 × 38 × 57;

  • Step Five: FC characteristic of the input layer to the three layer category or regression. The first and third 2k vertical coordinate k side-refinement regression is used to anchor the positional information of k (to be simply understood as a determined character position of the small rectangle, small red above schematic block length, a width fixed, default is 16), the second 2k scores indicates the type information of the k anchor (not character or a character);

  • Step Six: an elongated rectangular frame structure using the text algorithm is obtained, which is incorporated into a sequence of blocks of text. The main idea of ​​the text which construction algorithm is as follows: two similar candidates per district to form a pair, combining different pair until no longer be consolidated so far.

The main principle is to introduce CTPN above, the use of text CTPN model in natural scene, and the results shown below:
img

III Summary

The biggest highlight CTPN RNN model is introduced to detect .
Wherein the first depth obtained by CNN, and then fixed with anchor width (fixed width, elongated rectangular frame) to detect text area, the same line corresponding features strung anchor sequence, which is then input to the RNN, then the whole connected do layer classification or regression, and finally a small frame candidates to merge, resulting in the complete text area is located. The method of this seamless integration RNN and CNN effectively improve the detection accuracy.

Guess you like

Origin blog.csdn.net/wsp_1138886114/article/details/100041204