Use deep learning to identify captcha verification codes

The Google graphics verification code has become useless in front of AI, so Google announced that it will withdraw from the verification code service. Why?
The following article may explain whyInsert picture description here

This article will use Keras to build a deep convolutional neural network to identify captcha verification codes. It is recommended to use a graphics card to run the project.

The following visualization codes are all done in jupyter notebook. If you want to write a python script, you can run it normally with a little modification. Of course, you can also remove these visualization codes. Keras version: 1.2.2.

GitHub address: https://github.com/ypwhs/captcha_break

captcha
captcha is a library for generating verification codes written in python. It supports image verification codes and voice verification codes. We use the function of generating image verification codes.

First, we set our verification code format to numbers and letters, and try to generate a string of verification codes:

from captcha.image import ImageCaptcha
import matplotlib.pyplot as plt
import numpy as np
import random

%matplotlib inline
%config InlineBackend.figure_format = ‘retina’

import string
characters = string.digits + string.ascii_uppercase
print(characters)

width, height, n_len, n_class = 170, 80, 4, len(characters)

generator = ImageCaptcha(width=width, height=height)
random_str = ‘’.join([random.choice(characters) for j in range(4)])
img = generator.generate_image(random_str)

plt.imshow(img)
plt.title(random_str)
Insert picture description here

When the data generator
trains the model, we can choose two ways to generate our training data, one is to generate tens of thousands of images at a time, and then start training, the other is to define a data generator, and then use the fit_generator function to training.

The advantage of the first method is that the graphics card utilization is high during training. If you need to adjust the parameters frequently, you can generate it once and use it multiple times; the advantage of the second method is that you do not need to generate a large amount of data, and you can use the CPU during training. Generate data, and another advantage is that you can generate unlimited data.

Our data format is as follows:


The shape of X X is (batch_size, height, width, 3). For example, if 32 samples are generated in a batch, the image width is 170 and the height is 80, then the shape is (32, 80, 170, 3), take the first image That is X[0].


The shape of y y is four (batch_size, n_class), if converted to numpy format, it is (n_len, batch_size, n_class), for example, a batch of 32 samples is generated, there are 36 types of characters for the verification code, and the length is 4 digits , Then its shape is 4 (32, 36), which can also be said to be (4, 32, 36), and the decoding function is in the next code block.

def gen(batch_size=32):
X = np.zeros((batch_size, height, width, 3), dtype=np.uint8)
y = [np.zeros((batch_size, n_class), dtype=np.uint8) for i in range(n_len)]
generator = ImageCaptcha(width=width, height=height)
while True:
for i in range(batch_size):
random_str ='' . join([random.choice(characters) for j in range(4 )])
X[i] = generator.generate_image(random_str)
for j, ch in enumerate(random_str):
y[j][i, :] = 0
y[j][i, characters.find(ch)] = 1
yield X, y The
above is an example of unlimited data generation. We will use this generator to train our model.

Using generators The
method of using generators is very simple, just use the next function. The following is an example, 32 data is generated, and then the first data is displayed. Of course, here we also decode the generated One-Hot encoded data, first convert it to a numpy array, and then take the position of the largest number among 36 characters, because the neural network will output the probability of 36 characters , And then convert the number of the four characters with the highest probability to a string.

def decode(y):
y = np.argmax(np.array(y), axis=2)[:,0]
return ‘’.join([characters[x] for x in y])

X, y = next(gen(1))
plt.imshow(X[0])
plt.title(decode(y))
Build a deep convolutional neural network
from keras.models import *
from keras.layers import *

input_tensor = Input((height, width, 3))
x = input_tensor
for i in range(4):
x = Convolution2D(322**i, 3, 3, activation=‘relu’)(x)
x = Convolution2D(32
2**i, 3, 3, activation=‘relu’)(x)
x = MaxPooling2D((2, 2))(x)

x = Flatten()(x)
x = Dropout(0.25)(x)
x = [Dense(n_class, activation=‘softmax’, name=‘c%d’%(i+1))(x) for i in range(4)]
model = Model(input=input_tensor, output=x)

model.compile(loss=‘categorical_crossentropy’,
optimizer=‘adadelta’,
metrics=[‘accuracy’])

The model structure is very simple. The feature extraction part uses two convolutions and a pooling structure. This structure is the structure of the learned VGG16. After that, we flatten it, then add Dropout, try to avoid over-fitting problems, and finally connect four classifiers, each of which has 36 neurons, and the probability of outputting 36 characters.

Model visualization
Thanks to Keras's own visualization, we can use a few lines of code to visualize the structure of the model:

from keras.utils.visualize_util import plot
from IPython.display import Image

plot(model, to_file=“model.png”, show_shapes=True)
Image('model.png')
Here you need to use the pydot library and the graphviz library. The installation method on the macOS system is as follows:

brew install graphviz
pip install pydot-ng
Insert picture description here

We can see that the output shape of the last convolutional layer is (1, 6, 256), and no more convolutional layers can be added.

Training model
training model is the easiest one among all the steps. Just use model.fit_generator directly. The validation set here uses the same generator. Since the data is randomly generated by the generator, we don’t have to consider whether the data will be repeat. Note that this code may take an afternoon on the notebook. If you want the model to predict more accurately, you can change nb_epoch to 10 or 20, but it will also take twice as much time. Note that we used a little trick here, adding the nb_worker=2 parameter to let Keras automatically realize multi-process data generation, and get rid of the shortcomings of python's single-threaded low efficiency. If you don't add it, it takes 120 seconds, and it only takes 80 seconds to add it.

model.fit_generator(gen(), samples_per_epoch=51200, nb_epoch=5,
nb_worker=2, pickle_safe=True,
validation_data=gen(), nb_val_samples=1280)
Test model
When we are finished training, we can identify a verification code and try:

X, y = next(gen(1))
y_pred = model.predict(X)
plt.title(‘real: %s\npred:%s’%(decode(y), decode(y_pred)))
plt.imshow(X[0], cmap=‘gray’)
Insert picture description here

Calculating the overall accuracy of the
model The model will only display the accuracy of the first few characters during training. In order to calculate the overall accuracy of the model, we can write the following function:

from tqdm import tqdm
def evaluate(model, batch_num=20):
batch_acc = 0
generator = gen()
for i in tqdm(range(batch_num)):
X, y = next(generator)
y_pred = model.predict(X)
y_pred = np.argmax(y_pred, axis=2).T
y_true = np.argmax(y, axis=2).T
batch_acc += np.mean(map(np.array_equal, y_true, y_pred))
return batch_acc / batch_num

Evaluate (model)
here uses a library called tqdm, which is a library of progress bars, in order to be able to feedback progress in real time. Then we use some numpy calculations to count our accuracy. The calculation rule here is that as long as there is an error, then it is not considered correct. After calculation, the overall accuracy of our model can reach 90% after five generations of training, and continuing training can achieve higher accuracy.

Model summary
The size of the model is 16MB. It takes 20 seconds to run 1000 verification codes on my laptop. Of course, the graphics card will be faster. For the verification code recognition problem, even a 10% accuracy rate can be called a cracking. After all, assuming a 100% recognition rate to crack takes one hour, then a 10% recognition rate will only take ten hours. Affordable, and our recognition rate is 90%, which can be said to have completely cracked this type of verification code.

Improvement
For this kind of sequential written text, we have another method that can be used, that is, recurrent neural network to recognize the sequence. Let's take a look at how to use a recurrent neural network to identify this type of verification code.


The loss of CTC Loss is a particularly magical loss. It can make the model converge when only the sequence of the sequence is known but the specific position is not known. Baidu seems to have done a good job in this regard, using it to identify audio signals. (Warp-ctc)
Insert picture description here

Then in Keras, CTC Loss is already built-in. We can directly define such a function to realize CTC Loss. Since we are using a cyclic neural network, the first two outputs are discarded by default because they are usually meaningless and will affect The output of the model.

y_pred is the output of the model, which is the probability of 37 characters output in order. Because we use a cyclic neural network here, we need the concept of a blank character;
labels is the verification code, which is four numbers;
input_length represents the length of y_pred, Here it is 15;
label_length represents the length of labels, and here it is 4.
from keras import backend as K

def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
y_pred = y_pred[:, 2:, :]
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
model structure
Our model structure is designed like this, First, identify the features through a convolutional neural network, and then go through a fully connected dimensionality reduction, and then input into a special recurrent neural network in horizontal order, called GRU, which has some special properties. Why use GRU instead of LSTM? Generally speaking, its effect is better than LSTM, so we use it.

from keras.models import *
from keras.layers import *
rnn_size = 128

input_tensor = Input((width, height, 3))
x = input_tensor
for i in range(3):
x = Convolution2D(32, 3, 3, activation=‘relu’)(x)
x = Convolution2D(32, 3, 3, activation=‘relu’)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)

conv_shape = x.get_shape()
x = Reshape(target_shape=(int(conv_shape[1]), int(conv_shape[2]*conv_shape[3])))(x)

x = Dense (32, activation = 'reread') (x)

gru_1 = GRU(rnn_size, return_sequences=True, init=‘he_normal’, name=‘gru1’)(x)
gru_1b = GRU(rnn_size, return_sequences=True, go_backwards=True,
init=‘he_normal’, name=‘gru1_b’)(x)
gru1_merged = merge([gru_1, gru_1b], mode=‘sum’)

gru_2 = GRU(rnn_size, return_sequences=True, init=‘he_normal’, name=‘gru2’)(gru1_merged)
gru_2b = GRU(rnn_size, return_sequences=True, go_backwards=True,
init=‘he_normal’, name=‘gru2_b’)(gru1_merged)
x = merge([gru_2, gru_2b], mode=‘concat’)
x = Dropout(0.25)(x)
x = Dense(n_class, init=‘he_normal’, activation=‘softmax’)(x)
base_model = Model(input=input_tensor, output=x)

labels = Input(name=‘the_labels’, shape=[n_len], dtype=‘float32’)
input_length = Input(name=‘input_length’, shape=[1], dtype=‘int64’)
label_length = Input(name=‘label_length’, shape=[1], dtype=‘int64’)
loss_out = Lambda(ctc_lambda_func, output_shape=(1,),
name=‘ctc’)([x, labels, input_length, label_length])

model = Model(input=[input_tensor, labels, input_length, label_length], output=[loss_out])
model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer='adadelta')
model visualization
visualization The code is the same as above, here is only the map.
Insert picture description here

You can see that the model is much more complicated than the previous model, but in fact it looks very large just because there are more inputs. There is another point worth noting. Our picture is rotated when inputting. This is because we want to input in the horizontal direction, and the picture in numpy defaults to this shape: (height, width, 3), Therefore, we used the transpose function to convert the image to the format of (width, height, 3), and then after various convolutions and dimensionality reductions, it became (17, 32), where each vector of length 32 represents There are 17 features of a picture with a vertical bar, from left to right. Then we divided into two groups, one input from left to right to GRU, the other input to GRU from right to left, and then added up the results they output. The soldiers are divided into two ways, one is in the positive direction and the other is in the opposite direction, but the second time we directly connect their outputs, and then through a full connection, the probability of each character is output.
Data generator
def gen(batch_size=128):
X = np.zeros((batch_size, width, height, 3), dtype=np.uint8)
y = np.zeros((batch_size, n_len), dtype=np.uint8 )
while True:
generator = ImageCaptcha(width=width, height=height)
for i in range(batch_size):
random_str ='' . join([random.choice(characters) for j in range(4)])
X[i ] = np.array(generator.generate_image(random_str)).transpose(1, 0, 2)
y[i] = [characters.find(x) for x in random_str]
yield [X, y, np.ones(batch_size)*int(conv_shape[1]-2),
np.ones(batch_size)*n_len], np.ones(batch_size)
评估模型
def evaluate(model, batch_num=10):
batch_acc = 0
generator = gen()
for i in range(batch_num):
[X_test, y_test, _, _], _ = next(generator)
y_pred = base_model.predict(X_test)
shape = y_pred[:,2:,:].shape
ctc_decode = K.ctc_decode(y_pred[:,2:,:],
input_length=np.ones(shape[0])*shape[1])[0][0]
out = K.get_value(ctc_decode)[:, :4]
if out.shape[1] == 4:
batch_acc += ((y_test == out).sum(axis=1) == 4).mean()
return batch_acc / batch_num
We will use this function to evaluate our model. The same as the evaluation criteria above, we will only be considered correct if everything is correct. There is a pit in the middle, that is, when the model is first trained, it will not necessarily output four characters. So if we encounter less than four characters, we don't count it, which is equivalent to adding 0. When we encounter more than 4 characters, only the first four are taken.

Evaluation callback
Because Keras does not have an option to calculate accuracy for this output, we need to customize a callback function, which will calculate the accuracy of the model when each generation of training is completed.

from keras.callbacks import *
class Evaluate(Callback):
def init(self):
self.accs = []

def on_epoch_end(self, epoch, logs=None):
    acc = evaluate(base_model)*100
    self.accs.append(acc)
    print
    print 'acc: %f%%'%acc

evaluator = Evaluate()
training model
Since CTC Loss converges very slowly, we need to set a relatively large algebra. Here we set 100 generations, and then add an early stop callback and the callback we defined above, but the first training I only stopped training for 37 generations, and the test accuracy was only 95%. I continued training on this basis. I stopped for 25 generations and got 98% accuracy, so I trained 62 generations in total.

model.fit_generator(gen(), samples_per_epoch=51200, nb_epoch=100,
callbacks=[evaluator],
nb_worker=2, pickle_safe=True)
Insert picture description here

测试模型
characters2 = characters + ’ ’
[X_test, y_test, _, _], _ = next(gen(1))
y_pred = base_model.predict(X_test)
y_pred = y_pred[:,2:,:]
out = K.get_value(K.ctc_decode(y_pred, input_length=np.ones(y_pred.shape[0])*y_pred.shape[1], )[0][0])[:, :4]
out = ‘’.join([characters[x] for x in out[0]])
y_true = ‘’.join([characters[x] for x in y_test[0]])

plt.imshow(X_test[0].transpose(1, 0, 2))
plt.title(‘pred:’ + str(out) + '\ntrue: ’ + str(y_true))

argmax = np.argmax(y_pred, axis=2)[0]
list(zip(argmax,''.join([characters2[x] for x in argmax])))
The verification code that came out randomly this time is very powerful, yes O0OP, but even more powerful is that the model recognizes it.
Insert picture description here

Interesting question
I did a test with the previous model. For a crazy verification code like O0O0, the model can occasionally recognize it correctly. This surprised me. Can it really recognize the difference between O and 0, or guess Come out? It's hard to tell.

generator = ImageCaptcha(width=width, height=height)
random_str = ‘O0O0’
X = generator.generate_image(random_str)
X = np.expand_dims(X, 0)

y_pred = model.predict(X)
plt.title(‘real: %s\npred:%s’%(random_str, decode(y_pred)))
plt.imshow(X[0], cmap=‘gray’)
Insert picture description here

Summary
The size of the model is 4.7MB. It takes 14 seconds to run 1000 verification codes on my laptop. It can recognize 71 in average per second. It is estimated that it can beat the internet speed. As for whether deep learning can identify twins, I believe you already have the answer.

The Google graphics verification code is no longer valid in the face of AI, so Google announced that it will withdraw from the verification code service. Then when all the graphics verification codes are cracked,
"Tencent Waterproof Wall Sliding Puzzle Verification Code"
"Baidu Rotating Picture Verification Code"
"NetEase Easy Shield Slide" Jigsaw verification code"
"Top image area area click verification code"
"Top image sliding puzzle verification code"
"Extreme sliding puzzle verification code"
"Using deep learning to crack captcha verification code"

Are there better protection methods?
The next generation of hidden authentication security products developed by Xinxin Technology . Reasons for
choosing the enterprise SMS firewall developed by Xinxin Technology :
1. The application of AI three-dimensional defense technology, without graphic verification, completely solves the contradiction between "security" and "user experience". Internet products Focus on the user experience, no need to compromise for safety.
2 Rich visualization charts, panoramic view of defense interception data, real-time view of data details of the day and recent risk trends.
3 SAAS has extremely fast access, local deployment and operation, and millisecond response. The transaction risk control engine concentrates 10M installation package, collects basic data extremely quickly, and matches multi-dimensional risk characteristics. Avoid the "cloud mode" network delay problem.

Guess you like

Origin blog.csdn.net/weixin_46641057/article/details/113585653