One of the OCR recognition series ----- scene text recognition

Another method that is widely used is the deep learning method. The deep learning method divides OCR recognition into text detection and text recognition.

, and this is where deep learning techniques can be fully effective. The widely used network structure is Differentiable Binarization+ CRNN.

Differentiable Binarization, referred to as DB, is a text detection algorithm based on segmentation. In the text detection algorithm, the segmentation-based detection algorithm can better deal with curved and other irregularly shaped text, so it can often achieve better detection results. However, the process of converting the segmentation result into a detection frame in the post-processing step of the segmentation method is very complicated and time-consuming. Therefore, someone proposed a differentiable binarization module (Differentiable Binarization), which can perform the binarization process in the segmentation network . The binarization threshold is added to the training learning, which converts the probability map generated by the segmentation method into a bounding box/region of the text. The segmentation network is optimized in combination with the DB module, and the binarization threshold can be set adaptively, which not only simplifies post-processing, but also improves the performance of text detection. A more accurate detection boundary can be obtained, thereby simplifying the process of post-processing. The backbone network uses ResNet-18.

As shown in Figure 2 (shown by the blue arrow): first, a fixed threshold is set to convert the probability map generated by the segmentation network into a binary image;

Then, the pixels are grouped into text instances using some heuristic techniques such as pixel clustering. Alternatively, our pipeline (indicated by the red arrow in Figure 2) aims to insert the binarization operation into the segmentation network for joint optimization. With this approach, a threshold can be adaptively predicted for each location in the image to sufficiently distinguish between foreground and background pixels. However, the standard binarization function is not differentiable, and we propose an approximate binarization function, called Differentiable Binarization (DB), which is fully differentiable when it is trained with a segmentation network.

By combining a simple semantic segmentation network and a DB module, a robust and fast scene text detector is obtained.

Guess you like

Origin blog.csdn.net/wangmengmeng99/article/details/129994349