Many sites before some operations will be asked to enter a verification code in order to resist the reptiles and ***. Cipian focuses on how to identify some common code through the code. Picture identification in order to explore the process and how to avoid generating easily identifiable code.
theory
Picture identification process
-
This sampling
-
Cleaning distinguish samples
-
Extracting sample characteristics
- Extracting a target feature compared with the sample
During operation
Java has a wealth of picture processing class, this operation using the java language.
First, the code sample taking target site. Check the address verification code request in the web page. Batch Get codes via http request and stored locally.
Second, the distinction between samples. Each of the identification codes to distinguish between artificial picture, rename the codes for the image.
Third, the cutting washing samples, sample characteristics extraction. Picture identification as required separation of fine regional feature points. We observe the captcha on the map can be found more information:
● Background verification code, there are many interference lines.
● clear each number, position occupied almost equal.
● digital color codes of relatively deep, interfering factors lighter in color.
We can try to remove the confounding factors by shades of color. First by gradation processing, the lighter the color codes is replaced with a white dot, darker black dot replaced.
Gradation by changing the threshold point grayValue constantly trying to remove the interference. Finally get a clean code.
Next, by identifying black dots in the picture, using the following trainData () method.
The rectangle cut along the black dots, wherein the sample to give a single number.
Feature sample code to get a set of training as follows:
Fourth, extracting features of the target verification code, with training set to do comparison, identify the target verification picture
Through the above three steps, we've got a set of sample characteristics, then just need to the target code verification is also performed in step 3 above. The extracted target features and characteristics of the sample codes for comparison. If the color of most of the pixels of the same parties may be considered objective verification code consistent with the sample content. This sample file name, you can wait until the contents of the target verification. The following is a comparison of the identification code.
to sum up
By the above four operations, we have been able to identify some of the site's code. The method used above is obtained by color depth, the removal of interferon, re-extracted sample feature comparison. Face of some other code required by law to observe our grasp of the picture, the flexibility to use other algorithms to identify the removal of interferon, extract the sample characteristics.
Similarly, the process of generating codes, we need to avoid the formation of interferon easily removed. In between the various codes without affecting the manual identification of possible blocking up, to avoid being cut classified.
Articles from the public number: Rui Jiang cloud computing
Rui Jiang Yunguan website link: https://www.eflycloud.com/home?from=RJ0024