OCR text detection model -SegLink

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/wsp_1138886114/article/details/100042703


In natural scenes, such as light boxes billboards, product packaging, trademarks, etc., to detect where the text will be faced with a variety of complex situations, such as the angle of tilt, deformation, etc., then you need to use a method based on the depth of learning for text detection.
The method can be implemented in a natural scene detection to the text, are given in the text CTPN detection effect is based on the horizontal direction, the effect for non-horizontal text detection is not good, and in natural scenes, many all text information with a certain rotation angle, for example, using a mobile phone on the street signs, following FIG. If the result of the detection of the text only in the horizontal direction, with no angle information, the next chart is detected signs red box results, and in fact, the green box is the ideal target detection, visible results of error detection too.

img

So how can flexible detection of various angles of it? One of the most straightforward idea is to make the model can not only learn and location of the output frame (x, y, w, h), but also can output a rotation angle parameter θ text box. SegLink text detection model to introduce this article, this idea is adopted, i.e. SegLink detection model to detect rotation angle of the text, as shown below:

img

First, the main idea SegLink model

SegLink main detection process model as follows:

img

1, to generate a first detection of a segment (slice), as shown above the yellow frame, the segment (slice) is part of the text line (or word) may be a character, or a word or a few characters

2, through Link (Link) belonging to the same line of text (or words) of the segment (sections) are connected, above the green line in FIG. Link (Link) in the two overlapping segment is connected to the center point, as in FIG.

img

3、通过合并算法,将这些segment(切片)、link(链接)合并成一个完整的文本行,得出完整文本行的检测框位置和旋转角度。

其中,**segment(切片)、link(链接)**是SegLink模型的创新之处,该模型不但学习了segment的位置信息,也学习了segment之间的link关系,以表示是否属于同一文本行(或者单词)。

二、SegLink模型的网络结构

SegLink模型的网络结构如下:

img

该模型以VGG16作为网络的主要骨干,将其中的全连接层(fc6, fc7)替换成卷积层(conv6, conv7),后面再接上4个卷积层(conv8, conv9, conv10, conv11),其中,将conv4_3,conv7,conv8_2,conv9_2,conv10_2,conv11这6个层的feature map(特征图)拿出来做卷积得到segments(切片)和links(链接)。这6个层的feature map(特征图)尺寸是不同的,每一层的尺寸只有前一层的一半,从这6个不同尺寸的层上得到segment和link,就可以实现对不同尺寸文本行的检测了(大的feature map擅长检测小物体,小的feature map擅长检测大物体)。

1、segment检测

整个架构采取了SSD的思路,在segment(切片)检测上,与SSD模型检测过程类似,通过“套框”的方式,对结果进行回归,每个feature map(特征图)经过卷积后输出的通道数为7,其中两个表示segment是否为文字的置信度值为(0, 1),剩下的五个为segment相对于对应位置的default box的五个偏移量。每个segment表示为:

img

2、link检测

在segment与segment的link(链接)方面,主要存在两种情况,一种是层内链接检测、另一种是跨层链接检测。如下图:

img

Wherein the inner layer of the same feature detecting a link layer connection status of each segment and segment 8 neighborhood, each link has two scores: a positive score, a negative score, both belonging to the same positive score indicates a text (to be connected ); negative score indicates both belong to different text (to be disconnected). The detection of cross-link layer, mainly to solve the problem segment of the same text in different layers are detected, resulting in duplication detection, redundancy, on two adjacent layers of the feature map, the neighbor behind the segment that is in addition to this layer neighbor outer layer, the front layer also has its neighbors, but the layer is not a neighbor of the previous layer, after the merging algorithm would eliminate this redundancy.

3, merging algorithm

We thought merging algorithm is as follows:

  • The segment on the same line taken out
  • These are the central point of the segment as a least squares linear regression, a straight line
  • Each segment toward the center point of this line vertically projected do
  • Remove the two points most distant from the projection of all points, referred to as (xp, yp), (xq, yq)
  • Then the final merged text box (1) is the center point ((xp + xq) / 2, (yp + yq) / 2), (2) a width of two farthest points (xp, yp), ( xq, yq) plus half the distance (Wp / 2 + Wq / 2) where the width of the segment, (3) the height of the height of the average of all segment

As shown below, intermediate orange line represents the linear least squares regression, red dot indicates the center point of the segment, the yellow red dot represents a point on a straight line in the vertical projection, green border After the above is a complete merging algorithm processing block herein .

img

III Summary

SegLink increases the detection angle for various angles with text detection is robust, and is mainly used for detecting CTPN horizontal text lines, as shown below:

img

However, this model also has shortcomings, e.g. widely spaced can not be detected text line, as is used to connect between the adjacent segment by a link, it will not effect the text too far. Further, the text can not be detected strain or curved, since the last used when merging algorithm is a linear regression doing way, only fitting a straight line to fit a curve, but may be modified by combining algorithm, to achieve the modification, detection profile text.

Guess you like

Origin blog.csdn.net/wsp_1138886114/article/details/100042703