Application of Deep Learning in Document Correction

Application of Deep Learning in Document Correction

1. Scan the document

In our daily life, we often use the function of scanning documents. Including the built-in scanning function of IOS Memo, Scanner Almighty , etc., document scanning has brought us a lot of convenience. Document scanning can handle a wide range of applications, including ID cards, bank cards, paper forms, etc. The purpose of scanning documents is to better identify text information. However, when scanning a document, you will encounter all kinds of strange problems, including shadows, document wrinkles, document deformation, etc. Today we will mainly discuss the issue of document deformation.

2. Traditional methods to solve document deformation

2.1. Deformation document

In our daily life, when scanning a document, it is usually not possible to completely capture the entire document, or to capture a standard rectangle. Various problems will be encountered, resulting in certain deformation of the scanned image. This mainly includes bending, folding, folding, perspective, rotation and so on. Computer processing canonical documents is very simple, but when we simply rotate the document, some problems will appear. When faced with other more complex transformations, the scanning work becomes more difficult.

The picture on the left below is a situation that is easier for the computer to handle, but when processing the pictures on the right, the results are not so ideal.

Insert image description here

In order to facilitate identification, we will correct the deformed image. Here we compare traditional correction methods and correction methods based on deep learning.

2.2. Traditional methods

Before deep learning became popular, there were already relevant countermeasures to solve the problem of document deformation. For example, the playing cards in the picture below have issues such as rotation and perspective. We hope that the poker can be displayed individually in the form of an approximately perfect rectangle.

Taking Xiao Wang as an example, he will first find the coordinates of the four corners of the poker (document). Various image processing methods, image gradient, edge detection and other algorithms will be used here. Then estimate the width and height of the poker. Then we can understand the image restoration process as the following figure:

We can obtain a transformation matrix based on the four-point coordinates of the red box on the left and the four-point coordinates of the red box on the right, and then perform affine transformation on the original image to obtain the corrected image. Afterwards, text recognition is performed on the corrected image, so that the results obtained are more accurate.

In the above example, we apply the same transformation to each pixel, which solves the perspective problem very well. If faced with more complex deformations, such as bending, folding, etc. We also need to make adjustments to the above processing.

Images can be restored to a certain extent using traditional methods, but the real situation is much more complicated than the above, and various problems will be encountered using traditional methods.

3. The method based on offset field

Deep learning provides a new method for image deformation correction. This method is similar to the affine transformation above, but the transformation matrix is ​​obtained through deep learning. We call this transformation matrix the "offset field".

3.1. Offset field

An offset field is an image with a direction and magnitude, which is similar to an image gradient. The figure below is an example of an offset field.

The offset field has the same shape as the image, and each arrow in the offset field is a vector containing direction and magnitude information. That is, in which direction the corresponding position of the image needs to be offset, and the offset amount.

In actual operation, a neural network is trained, taking the deformation image as input, and then outputs the offset field. As shown below:
Insert image description here

After obtaining the offset field, we can correct the image.

3.2. Document Correction

We can let the original image and the offset field perform operations similar to affine transformation, that is, perform corresponding offsets for each pixel of the original image, and then we can obtain the corrected image. The offset operation diagram is as follows:

Insert image description here

In the above example, the original image is only partially deformed, which is difficult to deal with by traditional methods, but it can be easily solved by using the offset field. Compared with the traditional affine transformation, the offset field can make different transformations for each pixel, so as to make more flexible adjustments. Problems such as wrinkles and bends can be solved very well.

Sometimes the corrected document will have some gaps, so the complete repair process will also add a filling operation. There are also many ways to fill it, one of which is to fill it with the Inpatient network. The specific steps are as follows:

Insert image description here

Scanned documents are now very intelligent and can recognize various complex documents. Including handwritten manuscripts, word cloud pictures, tables, etc. Here, take Hehe Information's intelligent text recognition service platform TextIn as an example to experience the function of document scanning.

4. Actual experience

4.1. Specification pictures and manuscripts

We can experience our text recognition related functions in TextIn . Let's test relatively standard images first. A table image is used here for testing.

Insert image description here

The left side is the image used for recognition, and the right side is the recognition result. The content is perfectly recognized, and the content on the right can be copied directly.

Vehicle department Hehe information Transporttime car time May 20, 2020 Number ofpassengers 14 people
Destination Shibei, Jing'an District, Shanghai Cloud Cube
Contact Contact He Xiaohe contactnumbercontact number 18888888888 Driver drive safely and on time Driver drive safely and on time
Car reason (reason for using the car): official travel drive safely pick-up on time (Check after the car is finished by the rider. Check after the car is finished by the rider)
License plate number License plate number Shanghai M888888 Driver's name He Xiaoan contact number021-88888888 Pick-Up Locations Pick-Up Location No. 88, Shanghai Industrial Park
Person incharge audit vehicle department manager Liu Yang AdministrativemanagerHead of administrative department Yang Zhou

4.2. Shooting scripts

Document scanning is usually performed using captured pictures. Next, we tried to test it with the images we took ourselves, artificially adding some difficulties when taking pictures. The left side is the recognized image. It is believed that shadows, wrinkles, etc. are created to increase the difficulty of recognition. On the right side is the recognition result. It can be seen that the general content has been recognized normally.

Insert image description here

Here is part of it:

The following five parts:

(1) Sampling hole: enable digital equipment to observe specific image elements without being affected by other parts of the image.

(2) Image scanning mechanism: make the sampling hole move on the image in a predefined way, so as to observe each

One pixel.

(3) Optical sensor: detect the brightness of each pixel of the image by sampling, usually using a CCD array.

(4) Quantizer: Convert the continuous quantity output by the light sensor into an integer value. A typical quantizer is an A/D converter

circuit, which produces a value proportional to the input voltage or current.

4.3. Word cloud image

In addition to the above two conventional images, TextIn can also scan complex images such as ID photos, resumes, real estate certificates, word clouds, etc. For example, the following is an example of a word cloud:

Insert image description here

Compared with the previous questions, the situation of word cloud is more complicated. The text here is multi-lingual and multi-perspective, making it very difficult to identify. It's very possible to identify the results in TextIn:

HelloT.

Hello.

hello

That's it.

Greetings.

Hello.

Hello

Salam

While the text is recognized, it is also displayed in the corresponding language.

4.4. Moiré pattern removal

When we take pictures of electronic products, some strange textures will appear. This texture is moiré. Eliminating moiré can improve the clarity of images and text, making them easier to identify. Moiré pattern recognition can also be implemented using deep learning, and a specialized moiré pattern removal network can be trained. Here is an experience address for removing moiré patterns. The following is a comparison of images with moiré patterns and images without moiré patterns:

Insert image description here

After removal, the text content can be clearly seen.

4.5. PS intelligent detection

In addition to handling the above tasks related to document scanning and text recognition, TextIn can also perform PS intelligent detection to detect whether the image has been Photoshopped. Very effective in preventing fraud. Nowadays, PS technology is very mature. Many photoshopped images cannot be discerned by the human eye. PS can be used to forge reproduction records, academic certificates, paper certification documents, etc. Using PS intelligent detection can very well identify these fake images. Here we perform artificial PS processing on normal images and then test them in TextIn .

Insert image description here

On the left is an image that has been PS. It is difficult to judge whether it has been PS by human eyes. The right side is the detection result. In addition to showing whether there is tampering, the result will also show the tampered area.

4.6. Remove watermark

Removing watermarks is also a function we often need to use. Sometimes when we download images, some watermarks will be automatically added, which will block part of the content. TextIn provides the function of removing watermarks, which can be experienced in TextIn . The following is an example of the actual effect:

Insert image description here

The left side is the processed effect, and the right side is the effect with watermark. First of all, the watermark removal effect is very good, and the watermark is removed normally. And there is no blurry feeling in the part where the watermark is removed.

In addition, we can do an interesting thing. We can manually add a watermark to a document, then use TextIn to remove the watermark, and then submit the watermark removal result to the PS intelligent detection mentioned above to detect whether it has been tampered with. We can find a Very interesting phenomenon. You can test it yourself.

4.7. Automatically erase handwritten text

There is also an interesting function in TextIn, which is to automatically erase handwritten text. This is very useful when we scan test papers. This function can be experienced in TextIn . Here are the test results:

Insert image description here

What we are testing is a test paper that has been written and corrected. The test paper includes handwritten English, manual box selection, check marks, etc. After removal, the handwritten portions were removed, while the content of the test paper itself was retained. In addition, the removal result also enhances the original image, making it easier to view.

4.8. Seal detection and identification

For some enterprises, the function of seal recognition and detection may be used. The text on the seal is usually curved, and general text recognition programs cannot handle it well. TextIn provides the function of seal detection and recognition. Including detecting the seal in the image and identifying the text in the seal. The following is a specific effect:

Insert image description here

The picture on the left is the detected image. There are multiple seals in the image. The picture on the right is the detection result. Each seal was detected and the text content of the seal was identified. The above functions can be experienced in TextIn .

4.9. Other functions

In addition to the above functions, TextIn can also perform functions such as QR code recognition, bill recognition, vehicle-related recognition, and personal evidence recognition. There is also a document conversion function. Here are some interfaces that can be used:

Insert image description here

The above functions can be experienced directly, or you can use the API provided by TextIn to connect the functions to your own application. For specific API documentation, please refer to https://www.textin.com/document/index .

For example, the following is a piece of code for Python general text recognition:

import requests
import json

def get_file_content(filePath):
    with open(filePath, 'rb') as fp:
        return fp.read()

class CommonOcr(object):
    def __init__(self, img_path):
        # 请登录后前往 “工作台-账号设置-开发者信息” 查看 x-ti-app-id
        # 示例代码中 x-ti-app-id 非真实数据
        self._app_id = 'c81f*************************e9ff'
        # 请登录后前往 “工作台-账号设置-开发者信息” 查看 x-ti-secret-code
        # 示例代码中 x-ti-secret-code 非真实数据
        self._secret_code = '5508***********************1c17'
        self._img_path = img_path

    def recognize(self):
        # 通用文字识别
        url = 'https://api.textin.com/ai/service/v2/recognize'
        head = {
    
    }
        try:
            image = get_file_content(self._img_path)
            head['x-ti-app-id'] = self._app_id
            head['x-ti-secret-code'] = self._secret_code
            result = requests.post(url, data=image, headers=head)
            return result.text
        except Exception as e:
            return e

if __name__ == "__main__":
    response = CommonOcr(r'example.jpg')
    print(response.recognize())

The implementation is very simple, we only need to modify the image path in CommonOcr. For more functions, please refer to https://www.textin.com/

Guess you like

Origin blog.csdn.net/ZackSock/article/details/127570572