Hey, OCR text recognition understand!

Welcome to Tencent Cloud + Community to get more Tencent's massive technical practice dry goods~

This article was published by the cloud + community operation team on Tencent cloud + community

img

foreword

On March 27, 2018, Tencent Cloud + Community and Tencent Cloud Intelligent Image Team jointly held the Tencent Cloud OCR Text Recognition - Intelligent Image Sharing Event in the customer group. During the event, users listened to the introduction of the sharing guests patiently, and proposed related Questions, scientists and engineers of the intelligent image team also patiently answer user's questions. Here's what the event shared in full.

text

In our daily work, we will inevitably encounter some problems, such as the data that we have worked so hard to write, and finally print it out but find that the source file is lost. It is difficult to collect some business cards, but it is very troublesome to enter the information one by one. The business of courier companies is getting better and better, but it takes a lot of time to log in and record the waybill every day, which is very inefficient.

So, is there any technology that can help us solve these problems? Yes, that is OCR text recognition technology. Today we invited Ji Yongnan, AI scientist, Florali, Chen Yingtian, product manager of Tencent Cloud Big Data AI Product Center, and Xiao Xihua, senior engineer, to share Tencent Cloud's exploration in this field in recent years.

What is OCR?

OCR is a real-time and efficient location and recognition of all text information in pictures, and returns the text box position and text content. Support multi-scene, full-picture text recognition in any layout, as well as Chinese and English, letters, and numbers. In layman's terms, it is to intelligently recognize the text content on the picture into editable text, for example:

img

What is the technical principle of OCR?

The essence of OCR is image recognition. The principle is basically the same as other image recognition problems. Contains two key technologies: text detection and text recognition. First, the features in the image are extracted and the target area is detected, and then the characters in the target area are segmented and classified.

Taking the time of the rise of deep learning as the dividing point, until nearly five years ago, the traditional OCR recognition technology framework was still the most widely used in the industry. With the rise of deep learning, the OCR recognition framework based on this technology has another The new ideas quickly break through the original technical bottlenecks (such as text positioning, binarization and text segmentation, etc.), and have been widely used in the industry.

First, text localization, followed by slanted text correction, and then after segmentation of single words, word recognition, and finally semantic error correction based on statistical models (such as hidden Markov chains, HMM).

What is the difficulty of OCR technology?

Complex backgrounds, artistic fonts, low resolution, non-uniform lighting, image degradation, character deformation, multilingual mixing, complex layout of text lines, missing characters in the detection box, etc.

How to overcome these difficulties?

Start with several aspects. One is the usage scenario, and the other is the technical improvement. Tencent Youtu Lab has carried out in-depth optimization in the text detection technology, and proposed Compact Inception, which can improve the text detection/extraction ability of various scales by designing a reasonable network structure. . At the same time, the RNN multi-layer adaptive network and the Refinement structure are introduced to improve the detection integrity and accuracy.

imgimg

What functions does Tencent Cloud OCR currently support?

Based on the world-leading deep learning technology of Tencent Youtu Lab, we currently support: ID card recognition, bank card recognition, business card recognition, business license recognition, driving license recognition, license plate number recognition, general print recognition, handwriting recognition .

You can scan the QR code of the applet at the head of the article to experience our applet.

Technical difficulties of general printing, usage scenarios

We know that ID card recognition can be widely used in the financial industry. In the user's identity authentication, it can reduce the user's information input, improve efficiency, and improve user experience. The recognition of business licenses completely saves the tedious manual entry, and can also be used for Enterprises save a lot of human resource costs, and these scenarios are familiar to everyone.

For general printing, Tencent Youtu Lab independently designed a complete set of all-round multi-scale text recognition engine, which can overcome the problems of blur, defocus, perspective, and partial occlusion of text. The recognition accuracy rate is as high as 90%, which is at the leading level in the industry. It can be used in a wide range of scenarios, such as text recognition of images on any layout, which can be widely used in the recognition of printed documents, advertising graphics, medical care, logistics and other industries.

Are there any good examples for universal typography?

For example, in this advertisement, the content is multi-font, Chinese and English are mixed with numbers, and the background is relatively random. Our OCR can greatly restore the authenticity of the image through perspective correction, deblurring, etc., and greatly improve the robustness of the algorithm.

img

Another example is to identify posters with dense text, small line spacing, and perspective distortion. Manual identification is not only time-consuming, but also difficult to identify with the naked eye. However, Tencent Cloud OCR has designed a small and precise feature extraction network, combined with advanced preprocessing technology, the recognition accuracy rate is as high as 93%.

img

Sometimes the recognition rate is not ideal. How can the recognition accuracy be improved?

First, the current scene will be confirmed to cause the reason for the low accuracy. Evaluate room designs that can be improved, then make corresponding modifications, include preprocessing, etc.

Is there any case about Tencent Cloud handwriting recognition?

Tencent is the first service provider in China to apply handwriting recognition in complex scenarios. The accuracy rate of digital recognition is over 90%, the speed of single character recognition is within 15ms, and the accuracy rate of complex Chinese characters exceeds 80%.

Tencent Cloud Handwritten OCR has been applied to the waybill recognition scene, which solves the problems of huge manual input of daily express orders in the logistics industry, which is extremely error-prone and very inefficient.

imgimg

What is the difference between waybill recognition and traditional manual recognition?

If the traditional manual identification is 3min/order, 6.25 people/day are required for 1000 orders, and it takes a lot of manpower to ensure the timeliness of the waybill. Considering the labor cost will affect the timeliness of the waybill, and it is difficult to balance cost and service.

Our waybill recognition speed can reach millisecond level/order, and supports 24-hour recognition service. When business grows, we only need to invest server resources for computing, which is more flexible.

Compared with traditional recognition, not only the cost can be reduced, the accuracy can be improved, but also the risk of user privacy leakage can be protected.

At present, there are a wide range of application scenarios for OCR applications. What are the advantages of Tencent Cloud OCR?

Our OCR text recognition technology currently supports a total of 10,000+ tags in simplified and traditional Chinese, English, numbers, and punctuation, covering hundreds of fonts, and the uncommon character version even supports 2W+ tags.

Then we also have many landing customers in the industry, right?

The new version of mobile QQ uses our technology, which supports the function of extracting text in pictures on three entrances: scan, chat window and large picture preview of space pictures.

It is convenient for users to read, edit and save the text on the picture, so that the extracted text can be translated and searched. In a variety of scenarios, it can greatly improve the efficiency of users' reading and recording of text on pictures.

img

Our OCR technology is also used for business card recognition in corporate WeChat. Users only need to take a photo or select a picture of a business card to accurately and quickly identify the text in the business card, and automatically extract it into the corresponding field, which greatly simplifies the business card entry process and avoids possible errors in the manual entry process.

img

Interactive QA

After the above sharing, users have also raised a lot of questions. Let's take a look at what users have asked?

Q: Hello, I would like to ask whether OCR recognition supports H5 development?

A: Yes, the interface is based on the http protocol, as long as the http protocol is supported, it can be used.

Q: Is there any way to improve the remaining 10% of general print recognition?

A: The overall idea still has to go back to our three major engines. Optimize one by one.

  1. background recognition
  2. location engine
  3. Field Recognition Engine

Q: Is the current idea used in print recognition first segmentation and then recognition? Does OCR support offline recognition?

A: The idea is to first segment and then identify, our OCR supports offline identification.

Q: How to deal with the situation that OCR cannot be segmented or segmented incorrectly?

A: There are very few cases where it cannot be separated. The segmentation error will definitely affect the final result. Our technique can make correct segmentation even with overlapping characters.

Q: I would like to ask if it is possible to intelligently correct the customer's address if it is used for the waybill recognition scenario? For example, Shenzhen is written as Shentuchuan.

A: We will combine NLP technology and context for intelligent error correction.

Q: At present, related services have been provided in the market, so what is different or more advantageous for us in this area?

A: We have accumulated a lot of relevant experience in OCR, and we are also the first service provider in China to apply handwriting recognition in complex scenarios.

Q: What language does Tencent Cloud OCR service support? Are there any requirements for text size, font, etc.?

A: Chinese (Simplified and Traditional), English and numbers.

Q: I mentioned before that the total classification is as high as 2w+, how is this large classification model trained?

A: Grading, training in batches.

Q: Then I would like to ask you that automatic license plate recognition has been applied to daily life scenarios. Want to understand what is the technical difficulty of this?

A: Compared with the license type input image, the license plate input image is limited by the camera monitoring equipment configured in the actual scene and the random change of the vehicle position, resulting in various extreme angles and lighting cases, and the quality change range is much larger than that of the license type. Acquire images.

Q: The pictures you just showed were all flat. There is a certain curvature for the paper documents taken by the mobile phone. For example, if the folded paper is not flattened, there will be a certain curvature. Can this be handled?

A: We can handle the slight curvature through technology, but it is relatively difficult for serious deformation.

Q: In addition to supporting standard license plates, do we support the identification of new energy vehicles?

A: Yes, we currently support blue cards, yellow cards, military license plates, police license plates, coach license plates, new energy license plates, etc. The current recognition rate can reach 97%-98%~

Q: Regarding the photos taken by the mobile phone of the driver of the logistics company, one is that the surface sheet is not flat, the other is that the light is not ideal, and the third is that the camera angle is inclined. Is there any technical solution for the above situation?

A: The camera angle can be corrected by geometric algorithm. The problem of light can also be enhanced by normalization of the image. One side is uneven, it depends on how uneven it is.

Q: When the vehicle is running, does your 80% accuracy rate refer to the pictures taken during the movement of the vehicle or the pictures taken when the vehicle is stopped?

A: We do it frame by frame.

Q: Can it be trained by a certain amount of data accumulation, error correction, etc., so that it can recognize handwritten characters?

A: We have realized handwriting recognition~

Q: How high is OCR's recognition rate for motion blurred scenes?

A: The degree of blurring varies greatly. Not good for standardized statistics. The image quality is not good, the most straightforward way to deal with it is to do image enhancement.

Q: Does your company have related papers that can be consulted?

A:

https://cloud.tencent.com/developer/article/1007166

https://cloud.tencent.com/developer/article/1008463

https://cloud.tencent.com/developer/article/1029969

You can take a look at the articles in our community~ Many articles are the essence of the goose factory~

Q: The picture is a bit blurry, can you tell me a better specific algorithm, this is too general

A: There are many filters that can handle images with varying degrees of blur. There are also ways to deal with it using neural networks.

We see that no matter it is a complex text recognition scene or a small program application Tencent Cloud OCR can be solved, students who have any questions about this article can go to the Tencent Cloud Q&A community ( https://cloud.tencent.com/developer/ ask ) to ask your own questions, and then there will be invited related product students to answer your questions.

Thank you for your support to Tencent Cloud+ Community and Tencent Cloud Smart Image. To learn more about Tencent Cloud OCR recognition, please click: https://cloud.tencent.com/product/ocr . To learn about the Tencent Cloud OCR access process, please click: https://cloud.tencent.com/document/product/641/12412. To experience more Tencent Cloud AI products, please scan the applet code below. For Tencent Cloud AI cooperation, please contact the email [email protected], or add a smart image QQ group: 188257726. For those who did not participate in this event in time, please check the historical chat records or wait for the staff to share the articles after summarizing the chat records. For more excellent articles, please follow Cloud+Community ( https://cloud.tencent.com/developer ).

Tencent Cloud OCR Access Process

Step 1: After you log in to your account to register and pass real-name authentication, you can log in to the [Tencent Cloud Console] (link: https://console.cloud.tencent.com/ai ) to use it. If you do not have an account, please refer to [Account Registration Tutorial] (link: https://cloud.tencent.com/document/product/378/9603 ).

Step 2: Create a secret key After completing the registration, you need to create a secret key in [Access Management] (link: https://console.cloud.tencent.com/cam/capi ). AppID, SecretID and SecretKey are the only credentials for your application development, please keep them properly.

Step 3: Generate signature Verify the legitimacy of the request through the signature. The user can use the AppID, SecretID and SecretKey to generate the signature. For the specific signature generation method, please refer to [Signature Authentication] (link: https://cloud.tencent.com/ document/product/641/12409 )

Step 4: Call API We provide you with a variety of API interfaces, you can view and call the [OCR] (link: https://cloud.tencent.com/document/product/641/12407 ) service.

Step 5: Check the call You can log in to the [Tencent Cloud Console] (link: https://console.cloud.tencent.com/ai/ocr/namecard ) to check the call status of each OCR service.

This article has been authorized by the author to publish the Tencent Cloud + community, please indicate the source of the article if you reprint it

Original link: https://cloud.tencent.com/developer/article/1080576?fromSource=waitui

img

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325277489&siteId=291194637