[论文阅读] Geometry Normalization Networks for Accurate Scene Text Detection

原文链接: Geometry Normalization Networks for Accurate Scene Text Detection

https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/103545402

Idea:
The perspective of this paper is that the coverage of the CNN model for the geometry variance of the text detection box is limited (the detector obtained by training with limited variance is the best), first verify, and then propose to add several Different branches (combination of Scale Normalization Unit and Orientation Normalization Unit) form different detectors, because each sub-detector has its own variance, and the combination is large geometry variance, which can cover many frames. At the same time, for this With a unique design, the author also changed the way of image input to ensure that each branch is adequately trained.

statement of problem:

Prerequisite: Through the distribution of the orange line in (a), it can be seen that the boxes of icdar15 are mostly horizontal boxes (angle is a normal distribution with a mean value of 0 and a small variance). The author came up with, if you increase the angle variance In terms of range, you can watch the algorithm's coverage of geometry variance through the performance of the algorithm.

Method: The author expanded the geometry variance of the icdar15 frame by randomly rotating the sample, and then did an ablation experiment to expand the geometry variance in the training set and the test set, and got it, even if the training set is a large geometry variance, the training is obtained The model does not have a good result on the large geometry variance.The bottleneck of the CNN network is introduced, which is the ability of the large geometry variance.

Model proposal:

The author first proposed three methods for sample selection of icdar15, and then performed training on the generated sample set and tested the box with a short side length of [20,40] and an angle of [-π/12, π/12]. ,
wherein the first method is GSS (Geometry Specific Sampling), according to the picture in a text box as a telescopic guide, so that the size of the text box's range test

The second method is GVS (Geometry Variance Sampling), which is similar to GSS, but changes the range of the box to [0,90], [-π/2, π/2]

The third method is LGSS (Limited Geometry Variance Sampling), which is different from GSS in that not all samples are used

As can be seen from the above results, the GSS training results are the best, so it is very important to limit the variance to be predicted by CNN. At the same time, the number of samples is also very important.

Model design:

The author believes that a large geometry variance is cut into many sub-blocks, each of which is a small variance, and then a branch is responsible. Among them, a particularly important point is that during training and testing, rotating and scaling pictures serve each branch. It will be too troublesome, and feature transformation is used to do this here, which are the two modules proposed by the network.

i represents the converter serving the i-th branch, which is divided into two types, one is Fs is for Scale, the other is Fo is for Orientation

Scale Normalization Unit

Orientation Normalization Unit

A combination of both

Data enhancement
has been analyzed in the previous experiment, and the number of samples also has an impact on the performance of the network. When we divide the branch, when there is no data enhancement, some branches are doomed to be insufficiently trained (when training, we will ignore those data that are not distributed. For samples within its own range, fewer samples can be seen by the branch), so the author proposes to scale and rotate a sample accordingly, then it is visible to each branch, and each branch will also be fully trained. Make the distribution of each sample close to the total distribution

Thinking
In previous experiments, there were also methods of data enhancement for some branches, but did not pay attention to many details. Among the strategies of this paper, the points worth exploring are:

Ignore the branch that is not responsible for this range. For the network, the capability requirements of each branch are reduced, and it is better to train us when we did data enhancement before, ignoring the carrying capacity of the network itself, so sometimes even let the network through the sample Encountering more data will make the results worse.
This kind of papers that analyze and draw conclusions and propose solutions should not be too beautiful

[论文阅读] Geometry Normalization Networks for Accurate Scene Text Detection

Guess you like