"OCR in-depth practice" OCR study notes (1): Introduction



Preface

Computer character recognition, commonly known as optical character recognition, the full English name is Optical Character Recognition (OCR), it uses optical technology and computer technology to read out the text printed or written on paper, and convert it into a computer that can accept, A format that people can understand. OCR technology is a key technology to achieve high-speed text entry.

1. Introduction

1.1 General process of traditional OCR method

Essentially the character recognition problem can be classified as a sequence of labels, the main goal is to find a text string to a text string content of the image map, similarly NLP, it is different that OCRhas some unique properties:

  1. Local characteristics: the part of the text string will be directly reflected in the entire recognition target;
  2. Combination characteristics: the content combination of text strings is ever-changing

Insert picture description here
Step 1 Image input: For different images, because they have different formats and compression methods, different methods need to be used for decoding;

Step 2 Image preprocessing: mainly includes binarization, noise removal, tilt correction, etc.;

step 3 Layout analysis: The process of dividing document pictures into paragraphs and lines is called layout analysis;

Due to the diversity and complexity of actual documents, there is currently no fixed and unified cutting model.

Step 4 Character cutting: Because each word needs to be recognized, it is necessary to cut the formatted text into individual words for subsequent use of the recognition classifier;

step 5 Character recognition: from early template matching to later feature extraction;

Step 6 Layout recovery: According to the recognized text, return to the original document picture display, and output to Word document and PDF document without changing paragraph, position and order;

Step 7 Post-processing: According to the language model, perform semantic correction on the recognition result.

1.2 Traditional correction algorithm

Before inputting the picture for detection, it will first perform a picture correction operation on the original picture to ensure that the picture text is in the horizontal direction and improve the accuracy of text detection. There are generally two methods of image correction: 水平矫正, 透视矫正.

  • If the original image does not have too much perspective distortion, but the image has been rotated, the horizontal correction method is recommended;
  • If the image has perspective distortion, the perspective correction method is recommended.

1.2.1 Level correction

Most of the images to be recognized are card, bill, and form data. Some of these data have obvious contour edges, and some do not have obvious contour edges. For example, document-type images have a white background. For images, Huffman line detection can be performed according to the edge contour information of the text, and then angle detection can be performed. For some cards with obvious boundary contour rectangles, you can first detect the maximum contour and then directly rotate according to the contour angle.

1.2.1.1 Hoffman straight line correction

After the grayscale image into a single channel view to the use of Cannyoperators for edge detection, edge information so prominent pattern; Hoffmann reuse detect all possible linear transform; then all statistical straight angle with the horizontal direction and calculate the average Value; Finally, according to the average angle, the original image is rotated and corrected with the center point as the rotation point.
Insert picture description here

1.2.1.2 Contour correction

First, the image 灰度化noise removal performed 自适应二值化, and then 形态学闭操作disconnected even into blocks, and finally through 轮廓查找角度the key operation comprises the following two aspects:

  1. Find the smallest enclosing rectangle with the largest area and return the angle of the smallest enclosing rectangle
  2. Find the smallest circumscribed rectangle of all contours, return the average angle of all circumscribed rectangles

The specific process is as follows:

Maximum contour correction:
Insert picture description here

1.2.2 Perspective correction

1.2.2.1 Background

Now that mobile devices are becoming more and more popular, the proportion of printed pictures is declining. The mainstream pictures are obtained through mobile devices, but the quality of input pictures obtained through mobile devices is easily affected by the light and angle of the time. Some pictures may have a certain degree of distortion and text detection directly on these pictures, the detected text will have a certain amount of missing frames. In order to reduce the phenomenon of missing and missing text frames, it is necessary to perform perspective transformation on the image to correct the image before text detection is performed on the image.

1.2.2.2 Correction principle

  1. Firstly, the image is grayed, noise is removed, and the contour information is slightly expanded to make the contour information more prominent. At the same time, the broken contour is connected, and then the edge detection algorithm is used to detect the edge information;
  2. Based on the edge contour information, find the edge contour with the largest area, because generally the maximum edge contour is our area of ​​interest;
  3. After getting the maximum contour edge, use the method of fitting quadrilateral to find the position of the corner points scattered in the area of ​​interest, and then find the order of the corner points according to the position relationship of the upper left and lower right corner points, and finally according to the perspective mapping between the corner points and the target point Relationship, perform perspective transformation correction, and get the result.

Insert picture description here
The correction results are as follows:
Insert picture description here

1.3 Traditional text detection algorithm

1.3.1 Connected domain detection text

The traditional graphics method to extract the text position can generally be divided into two parts, the first is the extraction of connected domains, and the second is the discrimination of text connected domains. Wherein, after binarizing the image, by extracting links connected domain will find out all the communication domain as a candidate set, and in accordance with 启发式规则communication domain algorithm determines whether the extracted text communication domain. The process is shown in the figure:

Of course, there will be a series of preprocessing operations before text detection: binarization, layout analysis, table background and text area morphological separation, etc. The preprocessing results are shown in the following figure:
According to the preprocessed text block, contour detection is performed, and then the largest circumscribed rectangle of the contour is used as the text detection area of ​​the text block.

1.3.2 MSER detection text

The process of MSER:

Binarize a gray image with different thresholds. The threshold increases from 0 to 255. This increasing process is like the rising water surface on a piece of land. With the rising water level, some lower areas It will be gradually submerged. From a bird's eye view of the sky, the earth becomes land and water, and the water is expanding. During this "overflowing" process, some connected areas in the image change little or even no change, then this area is called the maximum stable extreme value area. In an image with text, because the color (gray value) of the text area is the same, so in the process of continuous growth of the horizontal plane (threshold), it will not be "submerged" at first until the threshold is increased to the text itself. Only when the gray value is "overwhelmed". This algorithm can be used to roughly locate the position of the text area in the image.

1.4 Traditional character cutting algorithm

When the deep learning serialization model has not yet emerged, traditional text recognition cannot directly perform text recognition on text lines, because the combination of words and words, and the combination of phrases and phrases cannot be enumerated and exhausted, and direct classification of these phrases is basically impossible. may. Compared with words and phrases, the number of characters can be exhausted. If text recognition is regarded as a single character recognition combination, the task is much simpler. Therefore, traditional text recognition is based on single character recognition. The process of traditional character segmentation is roughly as follows:

The above text detection algorithm first detects the text line, and then uses connected domains, vertical projection or other algorithms to perform character segmentation on a single character in the text line.

1.4.1 Connected domain contour cutting

In the partial graph of each text line slice, a single character is an independent object. If the outline information of a single independent object can be found, then its minimum bounding rectangle can also be obtained through some opencv functions; based on connected domains the general process works as follows:
Insert picture description here
first, the text slice two values, use the opencv findContoursfind possible single character outline, and then filter out some of the noise based on rules of thumb, finally bounding rectangle outline the use of NMSfiltering some repeat box, to get the final single character Check box.

1.4.2 Vertical projection cutting

In the partial image of the text line, in addition to the text pixels, there are background pixels. The distribution of pixels in the text area of ​​a single character is different from the distribution of surrounding pixels. There are generally fewer pixels between the characters and the characters. There are many internal pixels.

Based on this rule, we binarize the text row slices into white characters on a black background, count the number of white pixels in each column, obtain the distribution of white pixels in each column, and then find the black and white pixels in the range of the column scale according to the law Finally, the text line is divided into a single character according to the dividing interval point, and the final result is obtained. The technology for processing character segmentation in this process is called vertical projection character segmentation. The process is shown in the following figure:

The character segmentation interval point repair filter character segmentation process has character adhesion or character breakage. This happens because the number of white pixels in the adhered part is significantly higher than other segmentation points and is treated as a character. In order to correctly segment the character adhesion situation, additional information, such as the average character width, needs to be used to cut the segmentation point. The character breakage is due to the fact that some Chinese characters have radicals or are too de-noised or corroded, causing some characters to be made into two or more; in order to correctly segment the characters, the average width can be used.

1.5 Traditional text recognition algorithm

In the field of traditional OCR text recognition, the character recognition of text lines is regarded as a process of learning a multi-label task, which is a multi-classification problem.

  • Before character classification, we first normalize the character slices to a uniform size, referring to the size of the classic handwritten character classification , and 28x28then extract the features according to the common algorithms of the image after the uniform size, such as hog, siftetc.;
  • Finally, the classifier selects support vector machines, logistic regression, decision trees, etc., and the model training can be integrated end-to-end for predictive recognition. The general process is as follows:

1.6 General process of deep learning OCR method

Traditional OCR solutions have the following shortcomings:

  1. Generating text lines through layout analysis (binarization, connected domain analysis) requires that the layout structure has strong regularity and the front background is separable (for example, document images, license plates), and cannot handle random text with complex front background ( Such as scene text, menu, ad text, etc.).
  2. The character recognition model is trained by artificially designing edge direction features (such as HOG). Such a single feature rapidly decreases its generalization ability when the font changes, blur or background interference.
  3. Over-reliance on the result of character segmentation, the error propagation of segmentation is particularly prominent in the case of character distortion, adhesion, and noise interference.

The OCR method based on deep learning divides some cumbersome traditional method processes into two main steps:

  1. Text detection (mainly used to locate the position of the text)
  2. Text recognition (mainly used to identify the specific content of the text)

1.6.1 Text detection

As the name implies, text detection is to detect the area where the text in the picture is located, and its core is to distinguish between text and background.
Commonly used detection methods can read my text text detection column

1.6.2 Text Recognition

After locating the text area in the picture through text detection, the text in the area also needs to be recognized.
Commonly used method I can see the text recognition text recognition column

Reference link

  1. Huawei Cloud Community

Guess you like

Origin blog.csdn.net/libo1004/article/details/111898098