Sliding Line Point Regression for Shape Robust Scene Text Detection paper Translation Studies

Sliding Line Point Regression for Shape Robust Scene Text Detection

Distorting the shape of text detection

Summary:

Traditional text detection method focuses quadrilateral text. In order to detect the text in a natural scene of any shape, we propose a new method - slip point return line (SLPR). The edges of the text line SLPR plurality of return points, and then use the outline to draw the text of these points. SLPR The proposed method can be applied to many detection target architecture, such as faster R-CNN and R-FCN. Specifically, we first generate the smallest rectangle containing area network Recommendation (the RPN) text, and horizontal and vertical lines equally slidably points on the edge of the text reduction. To take full advantage of information, reduce redundancy, we calculated the position of the rectangular frame of the x coordinate or y coordinate of the target point, only the remaining x-coordinate or y-coordinate regression. This is to reduce the parameters of the system, and can suppress the generation of a regular polygon more points. Our approach in the traditional ICDAR2015 comes achieved relatively good results on CTW1500 scene of Text and text detection curve data set.

1 Introduction

Text detection in our daily lives is very important because it can be used in many fields, such as digital text, text translation. Some previous methods [1] [2] based on faster in many R-CNN [3] or SSD [4] SceneText level data set have achieved good results. Some methods [5] [6] [7] [8] [9] [10] [11] may also attempt to resolve any problems detected text. [9] and [11] First regression horizontal rectangle, then return quadrilateral. Target [12] is to create a rectangular irregular polygons after the handover. The method of the above-mentioned main text rows as a quadrangle, four points can be represented entirely. However, in a natural scene, in addition to a quadrangle, there are many other different lines of text shapes. Thus, a recent study [13] [14] have begun to explore the text line curve detection. In this paper the curve in any direction and text detection. Our approach, slip point return line (SLPR) is based on a two-step method for detecting a target using a faster or R-CNN RFCN. First, it raises some interesting generating a candidate area having a rectangular area network (the RPN), and then the edge point text regression. We generate some rules to determine what points should return, so there will be a correlation between the points. Unlike [13] degradation direct x and y coordinates of a fixed point and annotated using RNN [15], [16] learn their relevance, we introduce some rules, vertical and horizontal sliding along the line text, and then slide back the point of intersection of lines, one line of text as shown in FIG. Thus, we can only return to the point x-coordinate or y-coordinate, and the other rectangular coordinate position calculation, thereby reducing unnecessary computation and improve performance.

Contributions are as follows:

1. In this paper, the plurality of return points on the boundary lines of text, and attempts to deal with an arbitrary direction based text detection curve R-CNN faster and R-FCN.

2. Determining the true value of the return points and take advantage of a more regular polygon correlation points generated is introduced into the slip line method.

II. Related work

In recent years, the scene text detection and recognition more and more people's attention. However, due to the complexity of the scene positioning text and background, its detection is still a problem. All methods can be divided into three categories: character-based method, word-based methods and methods based segmentation. Character-based approaches typically require integrated data sets, as in a line of text markup characters need extra work. However, the generated data and real data there is a big deviation, not the trained models to achieve the most advanced results on real data sets, such as the popular ICDAR2015 accompanying text baseline scenario. To solve this problem, [17] the use of semi-supervised approach to real data modeling, and achieved good results.

Text segmentation based on the detection has been applied. [18] a complete convolution trained network (FCN) [19], [20] predicted saliency map text area, then the combined saliency map and characters to track assembly line. [21] adding a border to separate the text class and its neighbors. [10] and [8] The text mapping, and the size reduction of the corresponding quadrilateral and angle, i.e., the coordinates of the four vertices simultaneously. Compared with the traditional segmentation methods, they shipped in ICDAR2015 made great breakthrough on the text baseline scenario.

Many methods of detecting a target, such as faster R-CNN [22], SSD [4], R-FCN [23] and YOLO [24]. [2] Irregular 1 × 5 convolution filter, instead of using the standard 3 × 3 convolution filter to make the network more suitable for long text detection. [25] Note that using FIG remove background noise. In recent years, a growing number of researchers have proposed fast RCNN or R-FCN two-step method is based. [11] First generator axis aligned bounding box, then the text quadrilateral regression. They used multi-scale operation in the pool roipool layer. [9] An attempt simultaneous segmentation and detection text. Considering the particularity of the text line, [26] Additional anchor different angles for any direction of the text line. Recently, [14] Consider the case of a polygon and a new curve marked text data set. [13] also constructed a set of text data curve CTW1500, they proposed a new structure is called a text detector curve (curve text detector, CTD) to solve the problem of text detection curve.

III. Methods

Our model can be applied to any object detection framework in two steps, such as faster R-CNN and R-FCN. Regression coordinate system also specific points of the boundary line that contains the smallest rectangle including text and lines. More specifically, the R-CNN faster an example, we first use the RPN some interesting areas, and not only to return the rectangle, and a line of text and returns the coordinates of edge points, and finally we can obtain the text of arbitrary shape region.

Introduction

  1. Text detection model type (the first two paragraphs will introduce the general thesis)

            Detection recognition model based on the target candidate frame

      2. The existing model problems

              There is leakage detection

      3. This article will solve the problem

               1. In this paper, the plurality of return points on the boundary lines of text, and attempts to deal with an arbitrary direction based text detection curve R-CNN faster and R-FCN.

                2. Method to determine slip line introduction point return a true value, the full use of the correlation of the polygon to generate the points more regular.

 

First, the model

  1. The main innovation model (brief description)

Introduced on a text line some rules to slide vertically and horizontally on the straight line (which we used in the experiments equidistant sliding), and then slide the intersection of the boundary line and the regression lines of text.

A: What points should return?

Obviously, how to determine the recovery point of the polygon set is very important. We believe that the simpler the rules, the easier neural network learning. Due to the large change in the shape and angle of the natural scene, the fixed order of all the feature points are difficult to determine the shape, so we do not like the vertices of a polygon Fixed point regression. Although for the quadrilateral, we can return to the respective four vertices perfect restore it, but need to determine the order of four vertices of a complex set of rules, it is difficult to learn the neural network. Further, as shown in FIG. 2, we introduce a line of text on some rules to vertical and horizontal slide (slide we used in the experiments equidistant) on a straight line, and then slide the intersection of the boundary line and the regression lines of text. On the other hand, due to the constraints of the slip line, a correlation between the different coordinates of the point of intersection. At the same time you do not need to return to x and y coordinates of all points. If the horizontal slide, then the x coordinate of a point on the boundary of the text can be calculated by the coordinates of the rectangle, so we only need to return to the y coordinate of these points. Similarly, if a vertical slide, we just need to recover the x-coordinate of these points. The method not only reduces the computational complexity of the network, and will return to the point to be bound as a priori knowledge, prevent the formation of oddly shaped polygon, to further improve the accuracy. The number of rows for sliding, we observed that this parameter is insensitive to a quadrangular line. However, other shapes in order to properly recover a line of text, after the balance of performance and network complexity, we decided to slip lines 7 were used for processing of the vertical and horizontal directions. Therefore, a total of 14 straight lines produced with 28 intersections.

B. multi-task learning

Optimization of the parameters for the neural network shown in Figure 2, we learn multitasking method loss function L is defined as:

Which LRPN regional proposal is lost, LRCLS regional classification proposal is lost, LRB lost for the box back. LSLPR RPN loss after the second step. Similarly, the first two LCLS and LB are classified as loss and return loss box. λRλB and λS important factor related rights, in this study are set to 1. LSLPRB is operating at a loss SLPR proposed project:

LReg box regression task is smoothed L1 of loss:

在Eq.(4)中,n表示在一个方向上的滑动线的数量,我们在实验中设n = 7。通常,每行都有两个与文本行边框相交的点。如果有两个以上的交点,我们取最小的和最大的坐标。xvj为垂直滑移线和文本线边界交点vj的x坐标,yhi为水平滑移线和文本线边界交点hi的y坐标。x和y∗∗vj是神经网络输出相应的点估计。对于水平滑动的直线,我们只对其交点的y坐标进行回归。对于垂直滑移线,我们只对其交点的x坐标进行回归。其他坐标可以通过矩形的坐标恢复:

 

xmin和ymin表示矩形边界的最小x坐标和y坐标,xmax和ymax表示矩形边界的最大x坐标和y坐标。b·顺楼层功能。综上所述,为了回归多边形的坐标,需要考虑32个参数,包括矩形的4个参数和文本线边界上相交点的x、y坐标的28个参数。

 

C、多边形恢复

通过上述SLPR方法,我们可以从神经网络的输出中得到多个点。为了恢复最终的四边形或多边形,我们采用了以下两种方法进行比较:

1)只使用长边点(PLS):文本线总是延伸到长边,沿着长边滑动的线能更好地反映文本的形状。实际上,我们可以通过扫描长边来恢复多边形,如图3所示。具体来说,我们首先通过回归矩形判断文本行是水平的还是垂直的,然后通过相应方向上的点恢复多边形。以竖直方向为例在图3中,因为我们不回归矩形边界的交点,我们首先扩展边界附近的四行找到四个相交点的矩形,然后连接四个新的点和其他交集点生成多边形。

2)同时使用水平点和垂直点(BHVP):实际上,如果我们同时使用水平点和垂直点来恢复多边形,我们可以使用[27]中的方法粗略地计算经过这些点的多边形或四边形,如图4所示。这样我们就可以在水平方向和垂直方向上得到足够密集的点,而不需要像PLS方法那样计算矩形的交点。然而,我们发现BHVP对于多边形的效果不如PLS。因此,我们只在四边形数据集(ICDAR2015附带场景文本)上使用这种方法。

 

D. Polygonal non-maximum suppression   多边形非极大值抑制

非极大值抑制(Non-maximum suppression, NMS)是非极大值抑制(Non-maximum suppression, NMS)是目标检测中常用的一种基本方法,其目的是去除重复的方框。传统的NMS方法是基于矩形盒的,这不是其他形状的最佳选择。近年来研究了其他NMS方法,如感知位置的NMS[10]、倾斜的NMS[11]、Mask-NMS[9]和多边形NMS[13]。在本研究中,当我们考虑多边形时,我们在实验中比较了NMS和PNMS。

 

结果:

1) ICDAR2015附带场景文字:表I显示了不同设置下SLPR系统的结果。首先,对于文本区域的四边形的恢复,BHVP使用所有的点比PLS使用长边点的效果更好。其次,即使我们的目标是检测该数据集中的四边形,PNMS仍然优于NMS。最后,使用多尺度是提高不同目标尺寸检测性能的一种方法。我们还测试了我们的系统在(850,1000)的多尺度结果,得到了大约1%的Hmean度量的绝对改进。图5列出了ICDAR2015附带场景文本数据集上几个具有挑战性的检测结果示例。表2给出了在ICDAR2015附带场景文本上的SLPR与最新结果的比较。可以看出,我们的方法在该数据集上取得了比较好的效果。

2) CTW1500:表3显示了我们的方法在不同NMS设置下的结果。与ICDAR2015附带场景文本的观测结果不同,我们的方法在NMS0.3上取得了最好的结果,即传统的NMS方法以0.3为阈值计算IoU (Intersectionover-Union)。表4列出了我们的方法与CTD和CTD+TLOC相比较的结果。我们从[13]中删除了TLOC作为我们的基础网络,这和CTD是一样的。显然,与CTD方法相比,我们的SLPR方法的Hmean性能可以提高5.3%,证明了我们简单规则设置回归点的有效性。即使与增加了LSTM网络的CTD+TLOC方法相比,SLPR仍能使Hmean性能提高1.4%。图6给出了CTD、CTD+TLOC和SLPR检测结果的几个例子。可以看出,与CTD相比,我们的方法产生了更平滑的区域和更好的检测结果,这意味着所提出的SLPR能够更好地处理任意方向的情况,这是因为采用滑动线的水平对称和垂直对称扫描的新设计。

在本文中,我们提出了一种新的文本检测方法——任意形状的SLPR方法。与CTD+TLOC[13]的文本检测方法相比,SLPR在不使用LSTM的情况下更加简洁,获得了更好的性能。在传统的四边形数据集(ICDAR2015附带场景文本)中,SLPR也实现了最先进的性能。

总结

不做连线的话只能水平方向,连线可以多方向,但可能是扭曲的不规则形状

 

Guess you like

Origin blog.csdn.net/zx_good_night/article/details/88812379