One of the OCR recognition series - technical realization

If the image input into the system is a page of text, the first thing to do when recognizing is to judge the orientation of the text on the page, because the page we get is often not perfect, and may be skewed or stained, so the first thing we need to do is to preprocess the image, do angle correction and denoise.

Then we need to analyze the document layout, segment each line, cut the text of each line, and finally segment each line of text into columns, cut out each character, send the character to the trained OCR recognition model for character recognition, and get the result.

However, the model recognition results are often inaccurate, and we need to correct and optimize the recognition results. For example, we can design a grammar detector to detect whether the combination logic of characters is reasonable. For example, consider the word Because, the recognition model we designed recognizes it as 8ecause, then we can use the grammar detector to correct this spelling error, and replace 8 with B and complete the recognition correction. In this way, the entire OCR process is completed. From the summary of large modules, a set of OCR process can be divided into:

Layout Analysis -> Preprocessing -> Row and Column Cutting -> Character Recognition -> Post-processing Recognition and Correction

The realization of OCR technology can be generally divided into five steps: preprocessing pictures , cutting characters , recognizing characters , restoring layout , and post-processing text .

Preprocessing images and postprocessing text is the hardest part.

Cutting characters, recognizing characters, and restoring the layout are the core steps of character recognition.

1. Preprocess the image

(1) For skewed text, you can find the minimum area rectangle (minAreaRect) of the text, and then rotate the rectangle to straighten the rectangle to achieve angle correction.

If the text in the rectangular area is inclined again, consider using Hough Lines Transformation (HoughLinesP) at this time. Hough Lines Transformation is to find straight lines on the graph, because several points in the graph can form a straight line and draw these straight lines.

It can be found that the words in each line should be in a straight line.

Correction is performed by finding the straight line method.

(2) For distorted text, this kind of problem often occurs when taking pictures. For example, the angle of taking pictures will cause the text to be distorted.

Processing steps:

1. Input the original image

2. Grayscale processing

3. Binarization

4. Expansion operation

5. Corrosion operation, edge slimming

6. Edge detection

7. Rectangular frame detection

8. Correct the distorted rectangle

9. Correction completed

For some identified source data, the situation is better, and the preprocessing can be ignored

2. Cut characters

After preprocessing, after the picture becomes standardized, character cutting is performed to cut out each character. Because the final recognition of OCR is to recognize a single character (for example, to recognize you, it actually recognizes y, o, u in turn), and in addition, when cutting characters, each character must be marked for subsequent restoration operations. Restore according to the relative position between characters.

Method of cutting characters

(1) projection method

Using the common sense that each entity has a shadow, characters also have a shadow. This method can be used for row segmentation and column segmentation. Note that the row must be cut first, and then the column.

Line cutting: collect pixels horizontally, insert from the left, push out from the right, and pile all the black points to the far right

Column cut: cut vertically, on the basis of row cut, perform column projection for each cut block separately

 Finally, through the gap between the projections, we can cut each character.

In order to facilitate the identification of OCR, we program the cut characters into black and white characters. In the RGB color value, 0 represents black and 255 represents white.

3. Use neural network to recognize characters

The main thing to learn is the characteristics of each character. Although learning the characteristics on a 32*32 pixel picture, you can also learn very well.

We only need to build a neural network, and then input the picture for learning, and the machine will naturally learn the characteristics of each character. This process is relatively simple.

4. Text post-processing

reduction::

After the characters are recognized, the characters need to be restored. This step is very important. The character recognition result and character position information obtained earlier are used for restoration.

We judge whether the characters are in the same row and the same column according to the position information of the characters.

For example, to judge whether two words are in the same line, you can look at the overlap of the two words on the Y axis. If the overlap reaches a certain proportion, it can be considered that the two sets of data are in the same row. You can also look at the overlapping ratio of two texts in the vertical direction to judge whether they belong to the same column.

Correction:

In order to get more accurate text content, we also need to correct the results. Corrections can be made in context with the help of smart corrections. For example, some fixed coding rules and the like.

5. Summary

I think the focus of OCR lies in the data. The amount of data determines the recognition rate. For the same algorithm, the generalization of a large amount of data is obviously stronger than that of a small amount of data. When the amount of data is too small, once there are some changes, the recognition rate may be seriously affected. This machine learning is like a human being. You only read Chinese characters and recognize English for you. You definitely don’t know what the English characters are. So data diversity is very important.


 

Guess you like

Origin blog.csdn.net/wangmengmeng99/article/details/129947017