Rare dry goods, reveal the road to optimization practice of Alipay’s 2D code scanning technology

This article is quoted from the "Ant Financial Technology" public account, the original text was originally shared by the Alipay technical team. There are changes in this collection.

1 Introduction

The first contact with the 2D code scanning function was in 2011. The mobile Internet was in its infancy. Everyone felt that smart phones could be more powerful, but what functions should be explored and innovated. The 2D code scanning function is one of these innovative functions.

Of course, 2D code scanning is still image recognition in the final analysis. This technology cannot be handled by ordinary companies, so the most commonly used 2D code scanning library is ZXing . Many people should be very familiar with this library. Those who have used this library have basically remembered the following picture (ZXing's logo).

▲ Logo of ZXing Engineering

The prerequisite for the use of this library is that the camera of the mobile phone needs to have auto-focus function. The cost of the mobile phone is not as low as it is now, so the auto-focus function is not available in all mobile phones, which limits the 2D code scanning function in some lower end. The use of mobile phones also restricts the popularization of the scan code function.

Later, everyone knows that the social IM of WeChat is becoming more and more popular. The "Scan" function in WeChat can be used to scan QR codes and add friends, making the 2D code scanning function almost the standard of IM software. Match.

 

▲ "Scan" function in early WeChat

Nowadays, WeChat can not only scan 2 Weiyou friends and add friends, but also scan QR codes for payment and various image recognition functions, which are becoming more and more abundant. The 2D code scanning function has gradually evolved from a single image recognition technology to the entry function of mobile line networking.

Since last year, WeChat has upgraded the 2D code scanning function, which not only allows users to know the center point of the scanned 2D code on the UI, but also recognizes up to 3 2D codes at the same time, which is quite powerful.

 

▲ The current WeChat can recognize up to 3 2D codes at the same time (note the green dot)

As shown in the figure above, the 2D code scanning function that everyone used to be used to can be so friendly.

I just saw this 2D code scanning optimization article shared by Alipay. There are almost no articles on the market that really share this technology. As one of the most common functions of IM, 2D code scanning and adding friends is one of the most common functions in IM. For developers, although they do not need to develop from the bottom of their own, it is still necessary to understand this knowledge.

This article will share a summary of Alipay’s technical practice of how to improve the scanning recognition rate and recognition speed under harsh conditions such as incomplete, deformed, and discolored 2D codes for the 2D code scanning function. Hope to inspire you.

study Exchange:

-5 groups for instant messaging/push technology development and exchange: 215477170 [recommended]

-Introduction to Mobile IM Development: "One entry is enough for novices: Develop mobile IM from scratch "

-Open source IM framework source code: https://github.com/JackJiang2011/MobileIMSDK

(This article was published simultaneously at: http://www.52im.net/thread-3150-1-1.html )

2. Technical background

With the continuous expansion of Alipay’s offline scenarios, products such as money collection codes, word-of-mouth, shared bicycles, power banks, and parking payment have made our lives more and more convenient.

Because of its low cost and good compatibility, the QR code has become the most important online connection tool, and therefore it faces more new challenges.

Because the QR code is a dot matrix information encoding method, any visual defects, bends, and light effects will greatly affect the recognition success rate. If the recognition is difficult, it means that the user may choose to give up, which affects the payment experience and also affects the recognition success rate. User mind.

The most critical factors for the user's scanning experience are the following factors:

  • 1) Recognition rate: This is the basic indicator of the scanning code service. The recognition rate can directly reflect the recognition ability. If the recognition rate cannot be improved, it means that a large number of users will not be able to use more convenient services;
  • 2) Recognition time-consuming: including app startup time and image recognition time-consuming, this is a measure of the time it takes for a user to click on the app to correctly recognize the content. For every additional 1s, a considerable number of users will give up waiting and leave;
  • 3) Accurate feedback: The recognition result not only needs to be fed back to the user in time, but also needs to be very accurate, especially in the current offline scene where there are multiple QR codes, and the user needs to avoid secondary operations.

From the above three aspects, this article will share how Alipay's QR code scanning technical team creates an ultimate, accurate, fast and stable QR code scanning experience for users.

We have conducted a lot of statistical analysis on user feedback and found that most of the recognition failures are due to the non-standard QR code, and unfortunately, when we used our early scan code version for the recognition rate test, we found that the recognition rate was only 60% . The following text will first start from the direction of improving the recognition rate.

3. Improve the recognition rate strategy 1: Optimize the aspect ratio tolerance of the pile point search algorithm

The previous scan code algorithm allowed a 40% difference when checking the aspect ratio. However, due to the use of forward errors, the judgment result is related to the sequence of length and width, which will cause some codes with out-of-width ratios that cannot be scanned across. Rotated 90 degrees vertically but it can be swept out (^OMG^).

Summary of optimization strategy:

  • 1) By modifying the judgment rule of aspect ratio, aspect ratio will no longer be affected by sequence;
  • 2) For known lengths, the revised rules will expand the acceptable width range and enhance the tolerance of the aspect ratio.

In our comparative test set, the recognition rate has increased by about 1%.

4. Strategy to improve recognition rate 2: Add 1:5:1 pile point recognition mode

In a picture, to find the QR code, the key is to find the feature location of the QR code: 

The three-cornered back-shaped pattern is the feature positioning point of the QR code.

The ratio of black and white blocks in the middle area is 1:1:3:1:1:

In the previous code scanning algorithm, the stake recognition is to find the 11311 pattern through the state machine and then take the middle position to determine the x position (the scan line is at the first line 11311 ratio), search for the 11311 pattern longitudinally at the x position, determine the y position and then use ( x,y) The position is searched horizontally for 11311 ratio, and the x position is corrected.

This mode has poor recognition ability when the stakes are stained. As long as interference points are encountered in any 11311 pattern search, even the salt and pepper noise of one pixel can make the stake search fail. (The stakes of Alipay Blue will generate a lot of noise in the blue area, resulting in a low recognition rate)

To this end, we have added a new pile point recognition method. When the state machine reaches the 151 mode, it starts to try to confirm the stake point. (At this time, the scan line is at the ratio of 151 in the first row).

Optimization effect:

  • 1) The new search method will no longer be affected by the defacement of the center or edge of the pile point, and the recognition rate of Alipay's blue pile point code will be significantly improved;
  • 2) After the revision, the overall recognition rate has increased by nearly 1%, but the time-consuming of recognition failure has increased.

5. Strategy to improve the recognition rate 3: Add a diagonal filtering rule

Before enumerating all possible stake combinations O(N^3), perform a diagonal check filter on all suspicious stakes. Since the diagonal of the stake point should also satisfy the 11311 pattern, using this rule to filter suspiciously can effectively reduce the amount of calculation and also effectively reduce the time-consuming identification of success and failure.

6. Strategy to improve recognition rate 4: Two-dimensional code classifier based on Logistic Regression

In the previous scan code algorithm, after getting three stakes, check the three values ​​based on the included angle, length deviation, and unit length, and use a simple formula to calculate the threshold to determine whether it is a possible QR code. The probability of misjudgment is relatively high. Big.

To this end, we introduce the logistic regression algorithm model in machine learning.

Based on Alipay's rich QR code data set, a logistic regression model is trained as a QR code classifier, which significantly reduces the probability of misjudgment and also significantly reduces the time consumption of recognition failure without a QR code.

7. Strategy to improve the recognition rate 5: Modify the number of skip scan intervals

Due to the high resolution of the input camera frame, many pixels, and a large amount of calculation, the previous scan code algorithm skips sampling in the horizontal and vertical directions for calculation. However, in actual calculations, because too many columns were skipped, some 1 position points in the 11311 pattern were missed, which caused the stake search to fail.

By modifying the number of skipping calculation lines to a configurable item, we obtained the most suitable skipping strategy through the online AB grayscale test. After the overall configuration of this skipping strategy, the recognition rate has been significantly improved.

The performance of the above optimization in the test set: 

To sum up the optimization: The core recognition ability of scanning code has increased by 6.95 percentage points on the 7744 image test set.

8. Special strategy optimization

In addition to the above-mentioned general code scanning optimization, we have also improved the ability to scan codes in special scenarios.

8.1 Distortion? Not afraid or not afraid!

The offline scene is complex and changeable. The deformed QR code on the body of the beverage bottle, the QR code with curved corners of supermarket receipts, the uneven or even folded QR code by roadside vendors... These distorted QR codes can easily increase the difficulty of identification. Even lead to recognition failure.

In the anti-distortion strategy of the previous scanning algorithm, the perspective transformation relationship is first used to establish the mapping relationship.

The advantage is: good adaptability, to meet most application scenarios.

The shortcomings are also obvious: for the code of Version 1, because the mapping relationship is degenerated into affine transformation, the effect is poor, and the mobile phone must be parallel to the code plane to facilitate identification. When the surface of the material is not flat, the effect is poor.

Optimization Strategy:

  • 1) Assuming that the sampling coordinate system to the two-dimensional code coordinate system obeys a more complex mapping relationship, and assuming that the curl of the material surface is small, this mapping relationship can be better fitted by using a quadratic function;
  • 2) The version of the QR code on the actual invoice is generally greater than or equal to 7, and the higher version of the QR code has multiple auxiliary positioning points, which is more conducive to constructing a secondary mapping table;
  • 3) Based on the above inferences, the new mapping is used to replace the old perspective transformation for more accurate sampling.

With the new strategy, the two-dimensional code recognition capability of the invoice code scene has been significantly improved.

▲Note: Due to the enhanced algorithm, please align the QR code and wait for a while

Sample test results:

8.2 Improved fault-tolerant recognition capabilities

After the merchant or supplier generates the QR code, they usually paste the Logo on the middle part of the QR code. This part may cause errors in the QR code Decode.

Optimization Strategy:

For the BitMatrix obtained after sampling, for the points in a rectangular area in the middle part, some strategies are adopted to change the value of the middle point, so that it can pass the check of the fault tolerance boundary. Two strategies are currently used, the first is reversal, and the second is to randomly select a value for each point. The rectangular area currently taken is a quarter of the length and width.

After this optimization, the fault tolerance of scanning codes has also been significantly improved.

9. Less time-consuming recognition

For the recognition efficiency, we use GPU to calculate binarization to reduce the time-consuming to recognize a single frame.

The so-called image binarization is to set the gray value of the pixels on the image to 0 or 255, that is, to present the entire image with an obvious visual effect of only black and white. The left side of the figure below is the original image, and the right side is the binarized image.

Before the scanning algorithm is decoded, there is binarization calculation. The binarization calculation of the image can greatly reduce the amount of data in the image, and weaken the image blur, the color contrast is not strong, the light is too strong/too weak, the image is smeared, etc. In the case of interference from other information, it is more conducive to detection and identification.

Traditional algorithms perform binarization operations on the CPU, which consumes a lot of CPU resources, but in fact, GPUs are better at large-scale parallel calculations, so we choose to use GPUs for binarization calculations. Using RenderScript on the Android platform and  Metal on the iOS platform  are both very low-level frameworks.

1) iOS optimization results: unify environmental variables such as battery, angle, light, etc., and test the 5 core camera binarization algorithms on iPhone6.

The performance is as follows: 

 

It can be seen that Metal has a very high advantage in image binarization. Compared with the original pure CPU processing, it is nearly 150% faster, while reducing CPU resources by nearly 50%.

2) Andriod optimization results: Due to the large number of Android models, we extracted online data, and we can see that the GPU has significantly reduced the single-frame time consumption by more than 30% in the binarization process. 

10. Algorithm classification, scene classification, scientific scheduling

In order to solve some unsatisfactory scenes, such as special cases where the QR code is occluded, defaced, blurred or the angle is very bad, it is necessary to use some time-consuming but more powerful algorithms, but ordinary The situation does not require these algorithms.

Therefore, we prioritize the code recognition algorithm and schedule it through time lapse, frame skipping trigger, etc.

priority:

  • 1) High priority: execute every frame;
  • 2) Medium priority: lower frame rate execution;
  • 3) Low priority: execution at low frame rate.

The execution timing of functions with different priorities can be configured. Which priority the different functions belong to is configurable

Special scenario algorithm capabilities:

  • 1) Recognition ability of inverse color code;
  • 2) The ability to identify fault-tolerant boundary codes;
  • 3) The ability to identify damaged piles, etc.;
  • 4) Bar code recognition capability.

Appendix: More articles shared by the Ali team

" Social software red envelope technology decryption (7): Alipay red envelope massive high-concurrency technical practice "

" Ali DingTalk Technology Sharing: Enterprise-level IM King-DingTalk's outstanding features in the back-end architecture "

" From Ali OpenIM: Technical Practice Sharing for Creating Safe and Reliable Instant Messaging Services "

" Taobao Technology Sharing: The Technological Evolution Road of the Mobile Access Layer Gateway of the Hand Taobao Billion Level "

" Alibaba Technology Sharing: Demystifying the 10-year history of changes in Alibaba's database technology solutions "

" Alibaba Technology Sharing: The Hard Way to Growth of Alibaba's Self-developed Financial-Level Database OceanBase "

" The author talks about the story behind the "Alibaba Java Development Manual (Statute) "

" Taobao Technology Sharing: The Technological Evolution Road of the Mobile Access Layer Gateway of the Hand Taobao Billion Level "

" The Story Behind "Alibaba Android Development Manual (Statute) "

" The Architecture and Practice of Mobile Taobao Message Push System (Audio + PPT) [Attachment Download] "

" Heavy release: "Alibaba Android Development Manual (Statute)" [Attachment Download] "

" Ali Technology Crystal: "Alibaba Java Development Manual-v1.6.0-Taishan Edition" [Attachment Download] "

(This article was published simultaneously at: http://www.52im.net/thread-3150-1-1.html )

Guess you like

Origin blog.csdn.net/hellojackjiang2011/article/details/108796013