[New Knowledge Evaluation Lab] The Almighty King of Puzzle-solving and Scanning——"Smart HD Filter" Black Technology

Scanning technology has been widely used in various fields such as office (documents, business cards, invoices), study (notes, test papers), personal life (certificates, photos), business (receipts, invoices), law (contracts, evidence) and so on. However, a series of problems such as blurring, darkness, wrinkles, stains, light, and through words often appear in real images. For example, the following is a very common note photo. The wrinkles and layout curvature in the photo seriously affect the image quality and reliability. Readability, after being scanned by traditional scanning tools such as printers and scanners, it is still difficult to achieve practicability and usability.

Recently, the Scanning Almighty King of Hehe Information has launched a new black technology of "Smart HD Filter". When using the "Smart HD Filter", you don't need to think about the shooting angle, light source, background, and wrinkles, and automatically determine the image optimization method to achieve blur , darkness, fingers and other interference factors are fully processed . This article will take a closer look at this, and unveil its mystery from the in-depth principles and test comparisons.

1. Principle Analysis of "Smart HD Filter"

1.1. Dismantling of AI-Scan function

The realization of "intelligent high-definition filter" is based on the support of the intelligent scanning engine AI-Scan. From the three dimensions of "image processing, text recognition, and layout restoration", from perception, cognition to decision-making, the engine uses AI to automatically "check" image quality, lock problems and match corresponding optimization solutions, making image processing more intelligent and text The recognition is more accurate and the layout is restored "what you shoot is what you get".

The intelligent scanning engine AI-Scan mainly includes two parts: image perception and scene-based decision-making

1.1.1, image perception

Image perception mainly consists of general image processing - at this stage, applications use deep learning models to recognize and understand the content of images. Through the deep learning model, the application can perceive the characteristics of lighting, shadows, colors, and tilt angles in the image. For example, for finger occlusion, it can perform finger removal; for too dark or too bright images, it can adjust the brightness and contrast of the image; for skewed documents, it can automatically perform tilt correction, etc. This step often uses histogram equalization and perspective transformation.

Histogram equalization (Histogram Equalization) is a method to enhance the contrast of an image. Its main idea is to change the histogram distribution of an image into an approximately uniform distribution, and enhance the visual effect of the image by redistributing the gray level of the image. Enhances the contrast of an image. Specifically, the steps of histogram equalization are as follows:

  1. Statistical image grayscale histogram : statistics the grayscale value of each pixel in the image to obtain the relationship between grayscale level and frequency of occurrence.
  2. Calculate the cumulative distribution function of the gray level (CDF: Cumulative Distribution Function): The cumulative distribution function refers to the cumulative distribution function of the proportion of pixels whose gray value is less than or equal to a certain gray level in the image, which is used to represent the Cumulative probability of occurrence of pixels under the gray level. Its mathematical formula is expressed as: CDF ( k ) = ∑ i = 0 kp ( i ) CDF(k) = ∑_{i=0}^{k}p(i)CDF(k)=i=0kp ( i ) , where k represents the gray level, and p(i) represents the proportion of pixels with gray value i in the image to the total number of pixels.
  3. Calculate the mapping function of the gray level : linearly map the CDF to the gray level range of 0-255 to obtain a gray level mapping table.
  4. Apply the mapping table to the original image : For each pixel in the image, replace the original gray value with the mapped gray value according to the gray level mapping table.

Below is a piece of code for histogram equalization using OpenCV.

import cv2
from matplotlib import pyplot as plt

img = cv2.imread("test.jpg", 0)
plt.hist(img.ravel(), 256, [0, 256])
plt.savefig("result.jpg", dpi = 300, bbox_inches = "tight", pad_inches = 0)

# dpi : dot per inch
# bbox_inches: if 'tight', try to figure out the tight bbox of the figure.
# pad_inches: amount of padding[填充] around the figure when bbox_inches is 'tight'.

plt.show()

Histogram equalization can make the pixel distribution of the image as wide as possible and improve the contrast of the image, while the perspective transformation of the image can project the image from one viewing angle to another, thereby changing the geometric shape and viewing angle of the image, which can usually be used for Image correction, camera correction and other fields. The figure below is an experimental example of perspective transformation. It can be seen that perspective transformation not only corrects the perspective of the image, but also transmits the curved page into a flat document.

The steps of perspective transformation are as follows:

  1. Coordinates of at least four corresponding points in the original image and the target image are determined . Three of the points define a plane in the original image, and the corresponding three points also determine a plane in the target image.
  2. Use these corresponding points to calculate the perspective transformation matrix . The perspective transformation matrix is ​​a 3x3 matrix that can be obtained by solving a system of linear equations. A common way is to use the functions provided by OpenCV cv2.getPerspectiveTransform()to calculate the perspective transformation matrix.
  3. Transform the original image using the perspective transformation matrix . cv2.warpPerspective()Transformations can be implemented using functions. This function maps each pixel in the original image to its corresponding position in the target image according to the perspective transformation matrix, while performing linear interpolation to obtain a smoother result.

1.1.2. Scenario-based decision-making

Scenario-based decision-making makes generalized and scenario-based judgments based on the results of image perception. Based on the AI-Scan engine, the Almighty Scanner can intelligently decide how to optimize the image of the document and perform scene-based image processing.

For example, if the scene of an office document is recognized, element detection and identification will be carried out through layout restoration and machine learning technology, and the document image will be divided into different elements such as text, title, table, etc., and the text content will be further extracted. After extraction, pass Element aggregation and layout restoration can automatically identify and classify document content and improve the efficiency of document management; when a PPT shooting scene is recognized, Scanner can adjust the angle and contrast of the screen image, automatically remove reflective spots, and automatically remove screen moiré (As shown in the picture above), and automatically recognize the text on the screen; when the test paper is scanned and photographed, it can clear the handwriting traces with one click (as shown in the picture below), automatically calculate and weaken the background texture and reflection interference on the test paper, and improve image quality and readability sex.

1.2. Layout Restoration and Recognition Technology Analysis

In the evaluation, I paid close attention to the handling of the document layout by Scanner Almighty King. Layout restoration and recognition is an important branch of image processing and computer vision. It can be said to be a complex and highly professional task. It not only needs to accurately locate various elements, but also accurately analyze the content of these elements and the relationship between them. relation. Scanner Almighty integrates self-developed technology and cutting-edge theory to create a complete set of layout processing procedures, and successfully realizes accurate layout restoration.

1.2.1. Element detection and identification

Scan Almighty uses Layout-enginesuch a layout analysis framework to perform preliminary element detection and identification of documents. When analyzing the layout of document images, instance-level target segmentation is performed at the same time to accurately and effectively segment objects of different semantic categories in document images, and to distinguish different types of content such as text, pictures, and tables from the background. In order to better understand and process this information.

It uses a deep learning model similar to convolutional neural network (CNN) and Faster R-CNN(as shown below) to locate and identify elements in documents, such as text, charts, pictures, etc. The scope of processing includes paragraph detection, table detection, header and footer identification etc.

The above figure shows the framework structure of Mask R-CNN. The model first inputs the preprocessed original picture and sends it to the feature extraction network to obtain a feature map. Then, a fixed number of ROIOR is set at each pixel position Anchor, and these ROIareas are sent to RPNthe network for binary classification (foreground and background) and coordinate regression to obtain refined ROIareas.

Next, the operations proposed in the paper need to be performed on these ROI regions ROIAlign. This operation includes two main parts: first, corresponding to match the original image and feature mapthe pixels at the same position; second, to correspond feature mapwith the fixed one .feature

Finally, after completing all the previous steps, multiple ROI regions formed through the screening, matching and adjustment processes need to perform multi-category classification, candidate box regression and introduction of generation measures FCNto Maskcomplete the actual segmentation task.

1.2.2. Element aggregation

After accurately detecting elements, we need to perform reasonable aggregation on these elements. Element aggregation refers to combining recognized text, pictures, tables and other elements according to their positions in the layout to restore the structure and arrangement of the original layout. For example, the text of the same paragraph is aggregated to form a complete paragraph; the row and column units of the table are aggregated to generate a complete table. In this process, Scan Almighty uses a method similar to graph neural network (GNN) to build a graphical model to describe the relationship between elements, so as to achieve effective aggregation of elements.

Graph Neural Network (Graph Neural Network, GNN can effectively process non-Euclidean data with complex connection relationships. Its core idea is to disseminate and aggregate information on the graph structure. Perform calculation and update, gradually propagate and aggregate the information of the whole graph.The learning process can be divided into two key steps: message passing and node update.

  1. Message passing phase: Each node receives information from its neighbor nodes, and aggregates and transforms the received information. This process can be realized through the Graph Convolutional Layer (Graph Convolutional Layer), in which the feature information of each node is aggregated and converted with the feature information of its neighbor nodes. The specific aggregation method can be weighted average, maximum pooling, etc.
  2. Node update stage: The aggregated information is fused with the characteristics of the node itself to generate a new node representation. This process can be achieved through operations such as activation functions and fully connected layers. Through multi-layer node updates, GNN can gradually fuse and propagate the information of the whole graph.

1.2.3. Layout recognition

After element aggregation is complete, the content of these elements needs to be identified. Recognize text, recognize information in forms, parse data from barcodes and QR codes, and more. In this step, Scan Almighty uses a structure similar to the Transformer network model.

The Transformer network is a neural network model based on the self-attention mechanism. It was originally proposed by the Google Brain team in 2017 and has been widely used in natural language processing, computer vision and other fields. It can effectively handle long-distance dependencies, and has the advantages of parallel computing, which has significant benefits for large-scale document processing tasks, which makes the Scanner Almighty able to efficiently handle large-scale document processing tasks, and has better accuracy sex and benefits.

As shown in the figure above, the Transformer as a whole is mainly divided into two parts : Encoder and Decoder . The Encoder receives the input sequence and converts it into a series of vector representations, and the Decoder generates the target sequence based on these vector representations. Each Encoder and Decoder consists of multiple layers of self-attention layers and feed-forward neural network layers. The input sequence first becomes an Embedding that is easy for the computer to process, and then the Embedding is passed to the Encoder for encoding and mapped to hidden layer features. After the Encoder is combined with the previous output, it is input to the Decoder, and finally the subsequent probability of the sequence is calculated using softmax.

2. In-depth evaluation - "Smart HD filter" function

After an in-depth analysis of its principle, we will start to evaluate the "Smart HD Filter" function of Scan Almighty King. In order to have a more intuitive result, we selected two other products on the market: A product and B product, and conducted a horizontal Compared.

2.1. Image processing

First of all, we selected scanned books and documents that are common in life for evaluation. We took a picture with a tilted shooting angle and a curved book page, and performed three times under the same equipment environment and the same frame selection range. Scan, the results are as follows:

It can be seen that although all three can correct the tilted camera angle, only the Almighty Scanner can automatically correct the curvature of the page. Thanks to the perspective transformation principle it incorporates, it can reproject visually curved text and map To the plane for easy reading and file management . However, product A and product B obviously cannot correct the page bending, and product B has serious errors such as blurred text and distortion.

The image vision correction technology supported by AI-Scan makes the scanner all-rounder have unique advantages in handling complex scanning tasks. It can handle scanning not only flat documents, but also documents with various curved surfaces, and even documents taken at extreme angles.

2.2. Moiré removal

Secondly, we evaluated the removal of moiré when scanning the screen. We also took a PPT screen and saved it as a picture, and then transferred the picture to the Scanning Almighty and the other two products under review.

It can be seen that the Scanner Almighty basically completely eliminates the moiré on the screen, and the text and layout are not affected by the elimination steps . However, product A has severe moiré phenomena such as stripes and light spots after scanning. These moiré patterns Affects the clarity of the image and the reading experience. Although the B product has been optimized in terms of moiré elimination, the fonts are blurred at the same time, and there is still a slight moiré phenomenon.

The moiré removal technology of Scan Almighty's "Smart HD Filter" can help eliminate the interference of screen grains on image quality, improve image and text clarity, and thus improve image quality and user experience.

2.3. Function expansion

In the evaluation, we found that in addition to the "smart high-definition filter" function, the Almighty Scanner also provides a series of practical functions in other extensions.

Among them, the function of taking test papers can quickly and accurately take pictures of test papers by automatically identifying and correcting images, and can remove handwritten traces with one click, which is convenient for users to re-archive and review. The one-key scanning function of books provides a one-key double-page shooting function, and can remove back printing, finger covering, etc., to ensure scanning accuracy and clarity while improving efficiency. The functions of clapping the writing board and clapping the whiteboard introduce artificial intelligence technology to process the reflective area to eliminate or weaken the reflective phenomenon. Scanner Almighty can remove reflections and fingerprints on the screen, extract features according to the handwriting style of the handwriting board, and perform feature encoding on handwritten text, which helps to extract the text in the reflective area of ​​​​the document image, and restore strokes and strokes as much as possible. Close to the real writing details.

3. The experience of intelligent high-definition filter - "What you shoot is what you get"

After experiencing the "Smart HD Filter" of Almighty King, the first feeling is that "what you shoot is what you get". high-definition lossless scanning requirements.

Smart HD filter video demo

Its core adopts the intelligent scanning engine AI-Scan, which can intelligently perceive the type of document image, automatically detect problems and optimize them, automatically recognize multi-languages ​​and perform precise processing, and the built-in scene-based decision-making can clear fingers, moiré patterns, shadows and other common problems with one click. Image interference factors, restore image high-definition texture.

In office scenarios , Smart HD Filters can be used for fast document digitization. Users only need to use their smartphones to take pictures and scan, and the intelligent high-definition filter will automatically identify the content of the document, optimize the image quality, and generate a high-quality electronic version of the document, helping paperless office to improve the efficiency and quality of document transmission, storage and management.

In the field of education , smart high-definition filters can be used to scan and organize student notes. There is no need to consider the chaotic background and reflections of taking pictures. The smart high-definition filter will optimize the image quality with one click, recognize text information, and generate high-definition digital version of notes to improve students' learning efficiency and convenience.

In the field of law and accounting , intelligent high-definition filters can scan legal documents and financial statements, restore documents and evidence materials in high-definition, optimize image quality and recognize text information, so as to improve the efficiency and accuracy of legal and accounting work.

Generally speaking, the "Smart HD Filter" launched by Scanner Wang just has more intelligent image processing capabilities, more intelligent scene decision-making functions, and more powerful layout clarity and restoration, bringing users accurate, High-definition, convenient user experience, improve life and production efficiency.

Guess you like

Origin blog.csdn.net/air__Heaven/article/details/132358325