Image processing black technology - bending correction, moiré removal, edge trimming enhancement, PS detection

0 Preface

Hehe Information is an industry-leading artificial intelligence and big data technology company. It has been focusing on the field of text recognition for 16 years. It is in the domestic leading position in the core fields of intelligent text recognition and commercial big data. Global enterprises and individual users provide innovative digital and intelligent Serve. The black technology of text and image processing provided by Hehe Information has unknowingly affected all aspects of our life. Let us introduce the technical power that you have not noticed, but it has brought us great convenience. .

insert image description here

1 Bending correction

Modern neuroscience shows that the main job of the primary visual cortex of the mammalian brain is to perform dictionary representation of images, because vision is the most important sense of human beings-according to incomplete statistics, at least 80% of the external information is obtained by vision. However, the process of acquiring images by a computer is equivalent to using a two-dimensional plane to represent the three-dimensional objective world. The reduced dimension is called depth, just like we cannot understand the meaning of high-dimensional spaces such as four-dimensional and five-dimensional. Dimensionality is lost, causing difficulties in image processing.

insert image description here

Because the camera hardware doesn't live up to the theoretical assumption of infinitely small pinholes in the perspective camera model, real images suffer from significant radial distortion - lines in the scene appear as curves in the image. There are two types of Radial Distortion: Barrel Distortion and Pincushion Distortion. In addition, during the camera assembly process, the lens cannot be strictly parallel to the imaging plane, which will introduce tangential distortion. In addition, the viewing angle of the visual document image is generally not perpendicular to the document plane, resulting in distortion and distortion of the document image. For example, when a relatively heavy book is unfolded, the text areas on both sides of the spine will bend inward. It can be seen that the deformation of a distorted document is more complicated than that of a flat document, and the analysis and correction of it is also more difficult than that of a flat document image. The industry's main correction methods for distorted document images are:

  • A method based on the cylindrical model. The core idea is to treat the document as a cylinder, first locate all the text baselines, then calculate the bending function of the cylinder model according to these baselines, and finally use the model to correct the original document image;
  • hardware-based approach. This type of method usually scans the three-dimensional shape information of the paper using a special hardware device. For example, a structured light source is used to scan the document to obtain the exact depth information of the document, and then the document image is corrected according to the depth information;
  • based on image segmentation. The core idea is to divide the distorted document image into multiple approximate plane regions, rectify them independently, and finally stitch all the rectified regions together to restore the front view.

However, these methods have their own shortcomings: the method based on the cylindrical model will have a disconnected baseline when locating the text baseline, and may also locate the wrong text area in a severely distorted text area. In documents containing images, there are fewer effective text baselines, and the accuracy of the cylindrical model is also reduced. Although the hardware-based method has high accuracy, it is limited by hardware costs. In terms of application, it is not universal—users cannot be expected to have such expensive equipment; the method based on image segmentation is only suitable for plain text correction, and cannot process information with formulas or images.
The combined information adopts the offset field-based learning method to greatly improve the above-mentioned defects. Offset Field is a stacked U-Net network with intermediate supervision for directly predicting forward mappings from warped to rectified images. High-quality image synthesis datasets are created by warping undistorted images, and data-driven and learning methods can greatly cover a variety of real-world conditions, improving model generalization capabilities to commercial levels. Offset field learning trains the network end-to-end, so no hand-crafted low-level features are used, so it can handle a variety of document types while providing large-scale training data - overcoming the aforementioned inability to handle formulas and images problems; and can be deployed as an effective method for real-world applications.

2 Go moiré

When the camera shoots the electronic screen, the light-emitting dot matrix of the display device and the camera sensor array are aliased, resulting in a moiré phenomenon. Screen image moiré appears as stripes superimposed on the image, with variable colors and shapes. Moiré in an image mixes with the original image signal over a wide range of spatial and frequency domains, usually covering the entire image. The moiré pattern not only changes with different images, but also presents different colors and shapes with the change of spatial position in the same image. If the shooting distance or shooting angle changes slightly, the moiré pattern can vary greatly. In addition, the moiré pattern is also related to the type of camera, type of screen and image content. The appearance of moiré patterns seriously affects the visual quality of imaging, so it is necessary to use post-processing technology to remove image moiré patterns. As shown in the figure is an example of image moiré pattern. It can be seen that moiré pattern can be striped, grid-like or corrugated. The direction and radian of the texture are different, and the colors are also different. Therefore, the research on moiré removal technology is rich. challenge.

A common way to suppress moiré today is to perform preprocessing before imaging, for example, placing an anti-aliasing filter in front of the camera lens and applying a precise interpolation algorithm to the output of a color filter array (CFA). In addition, in the field of professional photography, the most effective way to remove moiré at present is the post-processing method with the help of professional image processing software - Adobe Photoshop is one of the most commonly used software. Photoshop, abbreviated as "PS", is an image processing software developed and released by Adobe Systems. It is mainly used to process digital images composed of pixels. There are many editing and drawing tools, which can efficiently edit images. However, these preprocessing methods are not universally helpful for the post-correction of the moiré de-moire algorithm of acquired moiré images, especially screen images—even requiring users to master professional software.

insert image description here
In recent years, deep learning is leading a revolution in computer vision and image processing. E.g:

  • Decoder network. The network consists of multiple convolutional and deconvolutional layers and learns an end-to-end mapping from degraded noisy images to original images. The convolutional layer acts as a feature extractor, extracting high-level features of the image while removing noise. Deconvolution layers are used to restore image details. Skipping symmetric convolutional and deconvolutional layers can speed up the convergence of training and make the model converge to a better local optimum. Skip connections allow signals to be directly back-propagated from higher layers to lower layers, solving the problem of vanishing gradients, making it easier to train deep networks and giving models better performance. Furthermore, skip connections directly pass image details from convolutional layers to deconvolutional layers, helping to restore the original image. The algorithm can be used for image restoration tasks such as image denoising and super-resolution;
  • Feedforward Convolutional Neural Networks. The algorithm learns the mapping from noise observations to clean images end-to-end, and uses residual learning and batch normalization methods to speed up the training process and improve the denoising performance, which can achieve Gaussian denoising with unknown noise levels;
  • Super-resolution reconstruction. The convolutional neural network is used to take the result of low-resolution image interpolation as the network input, and learn the mapping relationship from the interpolated image to the high-resolution image. The mapping is conceptually composed of three operations. First, extract the corresponding The image patch is represented as a high-dimensional vector, and then this input high-dimensional vector is nonlinearly mapped to a high-resolution high-dimensional vector, and finally a high-resolution image is generated according to the learned mapping relationship.

As shown in the figure, the de-moiré technology of Hehe Information is quite mature.

insert image description here

3 Image cropping enhancement

Image trimming enhancement is a sub-topic of image segmentation, and it is also an important work in image processing and computer vision. In recent years, it is also one of the research hotspots in the field of computer vision. The essence of image segmentation is to extract the part that people pay attention to from the image, and further analyze and process the useful information of the image. At present, according to the division principle, it is divided into:

  • Edge-based segmentation algorithm, this kind of method assumes that the change of pixel gray value on the edge between different regions is often severe, and then uses the maximum value of the first derivative of the image or the zero-crossing point information of the second derivative to provide the judgment edge. The basic basis of the points, and further, various curve fitting techniques can be used to obtain continuous curves dividing the boundaries of different regions;

  • Threshold-based segmentation algorithm, threshold segmentation is a common algorithm to directly segment an image, and it depends on the gray value of the image pixels. The basic principle is to divide the image pixels into several categories of target areas and background areas with different gray levels corresponding to a single target image by setting different feature thresholds;

  • Region-based segmentation method. The segmentation technology is based on the direct search area, and the specific algorithms include area growth and area separation and merging algorithms. Region growth starts from one or some pixel points, and finally obtains the entire region, thereby realizing the extraction of the target. The division and merger can be said to be the reverse process of region growth. It starts from the entire image, continuously splits to obtain each sub-region, and then merges the foreground regions to obtain the foreground target, and then realizes the extraction of the target. Region-based segmentation algorithm has better segmentation effect for more uniform connected objects. However, it needs to manually select seeds and is sensitive to noise, which may lead to holes in the area;

  • Cluster-based segmentation algorithm. The pixel sample set is divided into independent areas, the similarity of pixels in the same area is high, and the similarity between different areas is small. The early K-means algorithm measures the similarity by calculating the distance from each pixel to the cluster center after initializing a certain number of cluster centers, and the pixels with high similarity complete iterative clustering; the fuzzy clustering algorithm calculates the distance by calculating the distance. When the membership function is applied, the clustering can be realized more accurately, but the computational complexity is also high.

    insert image description here

4 PS detection

With the continuous development of Photoshop and the continuous improvement of the technical level, the application of PS can perform various operations on the image, such as beautification, segmentation, matting, etc., and the image after the operation will not appear very obtrusive. However, in the era of rapidly developing digital information, information security is a problem that we cannot ignore. If PS is used to maliciously copy, modify, delete or add pictures, it will lead to very serious consequences. High-quality PS images are difficult to identify with the naked eye, and it is necessary to adopt a suitable method to identify the altered images. The method based on image feature detection and matching is a common technique in the field of PS detection.

The most traditional and classic feature point detection method is the Scale-Invariant Feature Transform (SIFT). The core principle of SIFT is: based on the Laplacian pyramid, finding and locating the key to robustness in different scale spaces Point; based on HOG to achieve rotation invariance and generate feature descriptors. The main steps are as follows:

  • Scale space keypoint detection and localization
    To make the feature description scale invariant, SIFT searches for key feature points in the scale space. Considering that image features tend to have high-frequency characteristics, the Laplacian pyramid is used to construct the image scale space. The Laplacian pyramid is already composed of local extremum points of each scale image, and SIFT further detects the local extremum points of the pyramid as coarse key points. Considering that the above-mentioned rough key points are obtained in the Laplace scale space, the space has strong edge response while highlighting the feature response, and once the feature point falls on the edge of the image, it will bring instability. sex. This is because: ① it is difficult to locate a specific pixel point on the edge of the image, which is ambiguous; ② the edge point is easily disturbed by noise. Therefore, it is also necessary to eliminate the edge response of the correction point mentioned above. SIFT borrows the idea of ​​Harris corner detection to eliminate edge response. Let the Hessian matrix of each detection point be

H = [ f x x f x y f y x f y y ] H=\left[ \begin{matrix} f_{xx}& f_{xy}\\ f_{yx}& f_{yy}\\\end{matrix} \right] H=[fxxfyxfxyfyy]

The second-order partial derivatives are obtained by the finite difference method. To exclude edge points, simply make the response of the formula as large as possible - approaching the corners rather than the edges, which in SIFT is

t r ( H ) 2 det ⁡ ( H ) < ( γ + 1 ) 2 γ \frac{\mathrm{tr}\left( \boldsymbol{H} \right) ^2}{\det \left( \boldsymbol{H} \right)}<\frac{\left( \gamma +1 \right) ^2}{\gamma} the(H)tr(H)2<c( c+1 )2

So far, all the scale-invariant feature points of the image are obtained.

  • Feature point feature direction assignment
    SIFT assigns feature direction θ \theta to each feature pointθ , so that in different pictures, the same feature will be twisted to the same reference angle for comparison, realizing the rotation invariance of the feature description, as shown in the figure. Specific steps are described below. First take the feature pointX = ( x , y , σ ) TX=\left( x,y,\sigma \right) ^TX=(x,and ,s )T is the center, select the radius area, and calculate the gradient of each pixel. Taking the feature point as the center, the Gaussian kernel is used to weight the region to integrate the contribution of each pixel to the gradient of the feature point (the farther the pixel point, the smaller the contribution).

insert image description here

After the weighting is completed, a HOG in the range of 0~360° is constructed, with a rectangular column every 10°, and the gradient information of the above area is counted. Specifically, for pixels with a gradient direction of 0~9° in the region, the weighted values ​​of their gradient amplitudes are added to form the height of the first square column of HOG, and the rest are the same. After the HOG statistics are completed, the gradient angle range represented by the main peak includes the characteristic main direction of the feature point θ main \theta _{main}imain, fitting the formula by quadratic curve interpolation, starting from the iiThe angular extent of the i bins is estimated exactlyθ main \theta _{main}imain. In addition to having to be assigned a main direction, each feature point may also have one or more feature auxiliary directions θ else \theta _{else}ielse, which is defined as: when there is another histogram whose height is greater than 80% of the height of the main peak, the direction angle represented by the side peak is the secondary direction. The purpose of adding the auxiliary direction is to enhance the robustness of image matching. The main direction and the auxiliary direction are synthesized into the feature direction of the feature point θ = [ θ main , θ else ] T \theta =\left[ \theta _{main} ,\theta _{else} \right] ^Ti=[ imain,ielse]T. _ So far, the expression form of feature points is enhanced toX = [ xy σ θ ] T \boldsymbol{X}=\left[ \begin{matrix} x& y& \sigma& \boldsymbol{\theta }\\end{matrix} \right ]^TX=[xandpi]T , which is both scale-invariant and rotation-invariant

  • Generate SIFT feature descriptor
    for feature point X = [ xy σ θ ] T \boldsymbol{X}=\left[ \begin{matrix} x& y& \sigma& \boldsymbol{\theta }\\\end{matrix} \right] ^TX=[xandpi]T,以 ( x , y ) \left( x,y \right) (x,y ) as the center, divide its neighborhood intod × dd\times dd×d Cells. The side length of each Cell is3 σ 3\sigma . _ In order to ensure the rotation invariance of the feature points, the above neighborhood needs to be rotated to align the feature directions, the rotation radius is half of the diagonal length of the neighborhood, and the actual size of the feature point neighborhood after rotation is( 2 r + 1 ) × ( 2 r + 1 ) \left( 2r+1 \right) \times \left( 2r+1 \right)( 2 r+1 )×( 2 r+1 ) , after the rotation is complete, resample the neighborhood into a digital window of 6 × 16 pixels. Calculate the gradient of each pixel within the digital window. In order to prevent the sudden change of the window boundary or the gradient direction of a certain pixel due to noise interference, SIFT takes the feature point as the center and performs Gaussian filtering on the gradient amplitude. It must be pointed out that there is no digital window in the HOG algorithm, so no filtering operation is required.

Although the performance of feature description algorithms such as SIFT and SURF is superior, the memory occupied by their feature description and the time spent on feature matching cannot be ignored. For example, a 128-dimensional (Float type) SIFT descriptor requires 512B memory, while for the same 128-dimensional binary descriptor, only 16B memory is needed, and two binary strings can be quickly matched by Hamming distance. Therefore, the binary feature detection and matching algorithm is of great significance, and the BREIF algorithm is briefly introduced.

The main algorithm steps are as follows:

  • Detect and locate image feature points by FAST, SIFT, etc.;
  • Taking a certain feature point as the center, take a patch of size and apply Gaussian filtering (9×9 Gaussian kernel) to denoise it;
  • Randomly pick NN with Gaussian distribution within PatchN pixel point pairs, that is, the pixel points that are closer to the feature point, the greater the contribution to the feature description;
  • Use the following feature description mapping to NNBinary encoding of N pixel pairs to generate descriptors

T ( M ; X , Y ) = { 1 , f ( X ) < f ( Y ) 0 , e l s e T\left( M;X,Y \right) =\begin{cases} 1, f\left( X \right) <f\left( Y \right)\\ 0, else\\\end{cases} T(M;X,and )={ 1 ,f(X)<f( and )0 ,else

where f ( ⋅ ) f\left( \cdot \right)f( ) is the pixel intensity.

The description formed by the BRIEF algorithm is very sensitive to the rotation operation. When the rotation angle of the target image increases, the matching result of the BRIEF descriptor is greatly reduced, as shown in the figure (the vertical axis is the matching performance, and the horizontal axis is the rotation angle).

insert image description here

5 Summary

After introducing so many black technologies, everyone must have a certain understanding of the field of intelligent word processing. The purpose of Hehe Information's intelligent text recognition application development is to make the world more efficient! Hehe Information has been deeply involved in artificial intelligence for 16 years, with a cumulative global user download of 2.3 billion, enjoys 113 domestic and foreign invention patents, won 15 world championships in top AI competitions, and provided 30 industry intelligent solutions. Hehe Information provides efficiency tools that are loved by users around the world, such as the C-end business card Almighty King, Scanning Almighty King, Qixinbao, etc. I believe that Hehe Information has made great efforts in the fields of pattern recognition, deep learning, image processing, natural language processing, etc., and will benefit more people with technical solutions.

おすすめ

転載: blog.csdn.net/FRIGIDWINTER/article/details/127506936