[AI combat] Teach you how to implement a text recognition model (Introduction: Verification Code Recognition)

Text recognition has a very important application in real life. It is mainly composed of two key steps: text detection and content recognition. In the previous articles of this blog, the classic model principles of text detection and content recognition have been introduced (see the article: Dahua Text Detection of the classic model: CTPN , the classic text recognition model of big words: CRNN ), this article mainly introduces how to implement the text recognition model from the perspective of actual combat.

In the previous article, the actual content related to text recognition has been introduced: the actual content of handwritten digit recognition based on the MNIST dataset (see article: Train your first AI model: MNIST handwritten digit recognition model ), this relatively Simple. Today, I will introduce another classic application of text recognition: verification code recognition , as a practical introduction to text recognition.

 

Verification codes are very common in mobile APPs and WEB websites, mainly to prevent abnormal behaviors such as malicious login, ticket swiping, watering, crawler, etc., and may also be to relieve the background pressure of the system (for example, when killing and robbing tickets, it is mandatory to enter verification code code). This article mainly introduces the recognition of text-type verification codes. The text-type verification codes are composed of numbers, English uppercase and lowercase letters, and even Chinese random characters, and then perform operations such as deformation and distortion, adding interference lines, and adding background noise, mainly to prevent optical characters from being recognized. (OCR) and other programs automatically recognize the text on the picture and lose the effect, as shown below:

Due to the existence of relatively strong interference information, the effect of directly using OCR for identification is very unsatisfactory, and the identification of such complex information can be well realized through AI. At present, AI open platforms such as Baidu also provide an open interface for verification code identification. However, because verification codes can be generated by random combinations of APPs and websites according to any self-defined rules, the open interfaces for verification code identification of these AI platforms are in some cases. It works well in scenarios, but it may fail in some scenarios. For specific scenarios, we train the AI ​​model for verification code recognition by ourselves, which can well solve the verification code recognition problem in this scenario.

 

Let's start to introduce the recognition model of using Tensorflow to build verification codes. The main steps are as follows:

  • step 1. Get the verification code picture
  • step 2. Image annotation
  • step 3. Train the model
  • step 4. Model application

 

1. Get the verification code picture

(1) If you practice by yourself, you can directly generate the captcha image randomly as the basic data set. Use the captcha library in python to quickly generate a captcha image, install it through pip install captcha, or manually download the captcha-0.3-py3-none-any.whl file for installation. (Note: anaconda cannot install captcha directly through conda install, but you can use pip in anaconda to install captcha), the core code is as follows:

from captcha.image import ImageCaptcha
import random

# 生成验证码的字符集
CHAR_SET = ['0','1','2','3','4','5','6','7','8','9']
CHAR_SET_LEN = len(CHAR_SET)

# 验证码长度
CAPTCHA_LEN  = 4

for i in range(CHAR_SET_LEN):
    for j in range(CHAR_SET_LEN):
        for k in range(CHAR_SET_LEN):
            for l in range(CHAR_SET_LEN):
                captcha_text = CHAR_SET[i] + CHAR_SET[j] + CHAR_SET[k] + CHAR_SET[l]
                image = ImageCaptcha()
                image.write(captcha_text, '/tmp/mydata/' + captcha_text + '.jpg')

The resulting effect is as follows

(2) If you want to identify the verification code of a certain website, you can use some tools to download the corresponding verification code. The general website login interface is as follows:

Among them, you can usually directly click the verification code picture, or the "change one" button next to it to replace the verification code picture. At this time, you can use software that simulates mouse operation like "Key Wizard" to record a script, then simulate the right mouse button on the verification code picture to save the picture, and then click the verification code picture to replace the new verification code, and so on, that is, A large collection of captcha images from this site can be downloaded for training the model. As for the script for downloading the verification code picture, in order not to teach you any wrong, 500 words are omitted here, hehe~

 

2. Picture annotation

If the first step is to randomly generate the verification code image by yourself, then when saving the image, the file name is the text content of the verification code image, and there is no need to mark it.

If the first step is to download the captcha image of a certain website, you need to manually mark the text content of the captcha image to facilitate the subsequent model training. Through observation, the text information of the verification code picture can be recorded in the file name (renamed), and the picture can be marked in this way, or it can be recorded in the text file separately.

 

3. Training the model

(1) Label one-hot encoding

In order to input the text information of the captcha image into the convolutional neural network model for training, it is necessary to vectorize the text information. Here, "one-hot encoding" is used, that is, 01 encoding is used to represent text information. The verification code text length of this project is 4 digits, and the verification code code consists of numbers from 0 to 9. For example, the verification code text information is "1086", then the corresponding position is marked as 1 during one-hot encoding, and the rest are 0. As shown below

Then "1086" becomes [0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 after one-hot encoding 0 0 0] . The core code for one-hot encoding the verification code text information is as follows:

def text2label(text):
    label = np.zeros(CAPTCHA_LEN * CHAR_SET_LEN)
    for i in range(len(text)):
        idx = i * CHAR_SET_LEN + CHAR_SET.index(text[i])
        label[idx] = 1
    return label

(2) Read the picture file

Read the verification code image and the verification code text content (saved in the file name), and write a method to obtain the next batch of data. The main functions are as follows:


# 获取验证码图片路径及文本内容
def get_image_file_name(img_path):
    img_files = []
    img_labels = []
    for root, dirs, files in os.walk(img_path):
        for file in files:
            if os.path.splitext(file)[1] == '.jpg':
                img_files.append(root+'/'+file)
                img_labels.append(text2label(os.path.splitext(file)[0]))
    return img_files,img_labels

# 批量获取数据
def get_next_batch(img_files,img_labels,batch_size):
    batch_x = np.zeros([batch_size, IMAGE_WIDTH*IMAGE_HEIGHT])
    batch_y = np.zeros([batch_size, CAPTCHA_LEN * CHAR_SET_LEN])

    for i in range(batch_size):
        idx = random.randint(0, len(img_files) - 1)
        file_path = img_files[idx]
        image = cv2.imread(file_path)
        image = cv2.resize(image, (IMAGE_WIDTH, IMAGE_HEIGHT))
        image = image.astype(np.float32)
        image = np.multiply(image, 1.0 / 255.0)
        batch_x[i, :] = image
        batch_y[i, :] = img_labels[idx]

    return batch_x,batch_y

(3) Build a CNN model

Since the identification of the verification code is relatively simple, the CNN model is constructed based on the network structure of LeNet, which consists of 3 convolutional layers and 1 fully connected layer. The network structure diagram is as follows:

The core code is as follows:

# 图像尺寸
IMAGE_HEIGHT = 60
IMAGE_WIDTH = 160

# 网络相关变量
X = tf.placeholder(tf.float32, [None, IMAGE_HEIGHT * IMAGE_WIDTH])
Y = tf.placeholder(tf.float32, [None, CAPTCHA_LEN * CHAR_SET_LEN])
keep_prob = tf.placeholder(tf.float32)  # dropout

# 验证码 CNN 网络
def crack_captcha_cnn_network (w_alpha=0.01, b_alpha=0.1):
    x = tf.reshape(X, shape=[-1, IMAGE_HEIGHT, IMAGE_WIDTH, 1])

    w_c1 = tf.Variable(w_alpha * tf.random_normal([3, 3, 1, 32]))
    b_c1 = tf.Variable(b_alpha * tf.random_normal([32]))
    conv1 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(x, w_c1, strides=[1, 1, 1, 1], padding='SAME'), b_c1))
    conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv1 = tf.nn.dropout(conv1, keep_prob)

    w_c2 = tf.Variable(w_alpha * tf.random_normal([3, 3, 32, 64]))
    b_c2 = tf.Variable(b_alpha * tf.random_normal([64]))
    conv2 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv1, w_c2, strides=[1, 1, 1, 1], padding='SAME'), b_c2))
    conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv2 = tf.nn.dropout(conv2, keep_prob)

    w_c3 = tf.Variable(w_alpha * tf.random_normal([3, 3, 64, 64]))
    b_c3 = tf.Variable(b_alpha * tf.random_normal([64]))
    conv3 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv2, w_c3, strides=[1, 1, 1, 1], padding='SAME'), b_c3))
    conv3 = tf.nn.max_pool(conv3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    conv3 = tf.nn.dropout(conv3, keep_prob)

    w_d = tf.Variable(w_alpha * tf.random_normal([8 * 20 * 64, 1024]))
    b_d = tf.Variable(b_alpha * tf.random_normal([1024]))
    dense = tf.reshape(conv3, [-1, w_d.get_shape().as_list()[0]])
    dense = tf.nn.relu(tf.add(tf.matmul(dense, w_d), b_d))
    dense = tf.nn.dropout(dense, keep_prob)

    w_out = tf.Variable(w_alpha * tf.random_normal([1024, CAPTCHA_LEN * CHAR_SET_LEN]))
    b_out = tf.Variable(b_alpha * tf.random_normal([CAPTCHA_LEN * CHAR_SET_LEN]))
    out = tf.add(tf.matmul(dense, w_out), b_out)
    return out

(4) Training model

By setting the iterative rounds of model training, the number of samples obtained in batches, the learning rate and other parameters, read the verification code picture set, and randomly divide the training set and test set, and then load the network model of this project for training, every 100 steps Evaluate the accuracy once and save the model file. The core code is as follows:

# 模型的相关参数
step_cnt = 200000  # 迭代轮数
batch_size = 16  # 批量获取样本数量
learning_rate = 0.0001  # 学习率

# 读取验证码图片集
img_path = '/tmp/mydata/'
img_files, img_labels = get_image_file_name(img_path)

# 划分出训练集、测试集
x_train,x_test,y_train,y_test=train_test_split(img_files,img_labels,test_size=0.2,random_state=33)

# 加载网络结构
output = crack_captcha_cnn_network()

# 损失函数、优化器
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=output, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

# 评估准确率
predict = tf.reshape(output, [-1, CAPTCHA_LEN, CHAR_SET_LEN])
max_idx_p = tf.argmax(predict, 2)
max_idx_l = tf.argmax(tf.reshape(Y, [-1, CAPTCHA_LEN, CHAR_SET_LEN]), 2)
correct_pred = tf.equal(max_idx_p, max_idx_l)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver = tf.train.Saver(tf.global_variables(), max_to_keep=5)

for step in range(step_cnt):
    # 训练模型
        batch_x, batch_y = get_next_batch(x_train, y_train,batch_size)
        _, loss_ = sess.run([optimizer, loss], feed_dict={X: batch_x, Y: batch_y, keep_prob: 0.75})
        print('step:',step, 'loss:',loss_)

        # 每100步评估一次准确率
        if step % 100 == 0:
            batch_x_test, batch_y_test = get_next_batch(x_test, y_test,batch_size)
            acc = sess.run(accuracy, feed_dict={X: batch_x_test, Y: batch_y_test, keep_prob: 1.})
            print('step:',step,'acc:',acc)

            # 保存模型
            saver.save(sess, '/tmp/mymodel/crack_captcha.ctpk', global_step=step)

        step += 1

The training process is shown in the following figure:

After a period of training, the accuracy of the evaluation can reach more than 99%, and the verification code can be recognized very accurately.

 

4. Model application

By loading the trained model file, you can enter the image for verification code recognition. The core code is as follows:

# 加载网络结构
output = crack_captcha_cnn_network()

saver = tf.train.Saver()
with tf.Session() as sess:
    model_path = '/tmp/mymodel/'
    saver.restore(sess, tf.train.latest_checkpoint(model_path))

    output_rate=tf.reshape(output, [-1, CAPTCHA_LEN, CHAR_SET_LEN])
    predict = tf.argmax(output_rate, 2)
    text_list,rate_list = sess.run([predict,output_rate], feed_dict={X: [captcha_image], keep_prob: 1})   # captcha_image 为待识别的验证码图片

    tmptext = text_list[0].tolist()
    text=''
    for i in range(len(tmptext)):
        text = text + CHAR_SET[tmptext[i]]

    print('识别结果:',text)

The above is the entry-level actual content of text recognition: verification code image text recognition. Through this study, you can understand the implementation of simple text recognition.

 

Follow my official account "Big Data and Artificial Intelligence Lab" (BigdataAILab), and then reply to the " code " keyword to get the complete source code .

 

Recommended related reading

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324144102&siteId=291194637