One of the OCR recognition series ----- basic introduction

I have done several OCR projects recently, and it is basically over now. I want to use this spare time to systematically sort out OCR. First, I will introduce the basics of OCR.

1. Definition of OCR

OCR, also known as Optical Character Recognition in Chinese, uses optical technology and computer technology to extract the text information on the target. OCR recognition is one of the research fields of computer vision. At present, the applications in daily life are relatively mature. For example, ID card recognition, license plate number recognition, and applications such as taking pictures to search for answers.

2. Classification of OCR

At present, it is mainly divided into two categories according to the recognition of fonts: print recognition and handwriting recognition.

Print recognition is relatively simple, and the fonts are relatively regular, with dozens of font rules. However, there are also some difficulties in recognition due to the breakage, adhesion, and occlusion of printed ink marks. Generally speaking, the current recognition of printed characters is quite good, but it has not yet achieved 100% recognition accuracy.

Handwritten fonts have always been a challenge that the OCR industry wants to overcome, mainly because handwritten fonts are more diverse and have individual characteristics, such as prescriptions prescribed by doctors, it is difficult for human eyes to recognize what is written, and it is also difficult for machines to understand.

3. OCR identification method

At present, there are mainly the following types:

  • Tesseract, Google's open source OCR engine, was developed by foreigners, so the Chinese recognition effect is not good, but it works well in English and digital recognition.
  • The OCR API developed by Baidu can be called to realize text recognition by writing a python script, but it is not free. There is no charge for a small number of calls, and a charge for a large number of calls. It works well for Chinese character recognition.
  • The traditional method is to extract the features of the characters, and then input them into the classifier to obtain the OCR model. Before the development of deep learning, this method was basically adopted for complex scenes. The first step is feature design and extraction. We need to design its unique features for characters to prepare for subsequent classification. What are the characteristics of characters? Structural features: character endpoints, intersections, number of circles, number of lines, etc., the second part sends these features to the classifier (SVM) for classification, and obtains the recognition result. The disadvantage of this method is that it takes a lot of time to do feature design, and train the character recognition model through artificially designed features (HOG). In this method, once characters change, blur or background interference occurs, the generalization ability of the model will drop rapidly, and the results of character segmentation will be overly relied on. In the case of character distortion, adhesion, and noise interference, segmentation errors are especially prominent. This is why deep learning is basically chosen for OCR now.
  • Violent character template matching method is usually used in uniform fonts, high definition, simple recognition scenes, and simple character recognition
  • Character recognition based on deep learning (text detection + text recognition), at present, the better effect is to use the neural network to locate the text area, and then perform character recognition on the located text

Guess you like

Origin blog.csdn.net/wangmengmeng99/article/details/129796845