One of the OCR recognition series ----- document character recognition

If the image input into the system is a page of text, the first thing to do when recognizing is to judge the orientation of the text on the page, because the page we get is often not perfect, and may be skewed or stained, so the first thing we need to do is to preprocess the image, do angle correction and denoise.

Then we need to analyze the document layout, segment each line, cut the text of each line, and finally segment each line of text into columns, cut out each character, send the character to the trained OCR recognition model for character recognition, and get the result.

However, the model recognition results are often inaccurate, and we need to correct and optimize the recognition results. For example, we can design a grammar detector to detect whether the combination logic of characters is reasonable. For example, consider the word Because, the recognition model we designed recognizes it as 8ecause, then we can use the grammar detector to correct this spelling error, and replace 8 with B and complete the recognition correction. In this way, the entire OCR process is completed. From the summary of large modules, a set of OCR process can be divided into:

Layout Analysis -> Preprocessing -> Row and Column Cutting -> Character Recognition -> Post-processing Recognition and Correction

For example, when identifying the numbers of the electric meter, considering that there are few fonts on the electric meter (maybe only Arabic numerals), and the fonts are very uniform and the clarity is high, so the recognition difficulty is not high. For this simple recognition scenario, the first recognition strategy we consider is of course the simplest and most violent template matching method. We first define a digital template (0~9), and then use the template to slide and match the characters on the meter. This strategy is simple but quite effective.

Guess you like

Origin blog.csdn.net/wangmengmeng99/article/details/129724117