Machine learning - recognition of verification code based on convolutional neural network (python implementation)

You canFollowthe blogger to obtain the code. The response time is usually 2-3 hours, and within one day at the latest

Table of contents

You can follow the blogger to get the code. The response time is usually 2-3 hours, and within one day at the latest.

1 System introduction

2 Requirements analysis

3. Basic principles

3.1 Generate images

3.2 Image processing

3.3 Convolutional Neural Network

3.2.1 Composition of convolutional neural network

3.2.2 The specific working process of convolutional neural network

(1) Data regularization

(2) Convolution operation (Convolution)

(3)Activate

(4) Pooling

(5) Full connection

4 Scheme design

5 Detailed design

1. Automatically generate and classify verification codes

2. Verification code image processing

3. Convolutional neural network model training

4. Convolutional neural network testing and verification

6 System implementation

7 Test and evaluation results

8 Summary

9.Project code

1.train.py

2. Get the rest of the codes through private chat with bloggers (follow first)


1 System introduction

When we visit some websites and resources in our lives, we always need to enter the verification code for identification after entering the account number and password. So based on this, we want to make a model for identifying verification codes based on machine learning. This model has good accuracy in recognizing mixed verification code images with numbers 0-9, English lowercase letters a-z, and English uppercase letters A-Z. After training this machine learning, we can verify it on a similar verification set, and the verification accuracy can meet certain requirements.

2 Requirements analysis

Design requirements: The function can realize the recognition of mixed verification code images with numbers 0-9, English lowercase letters a-z, and English uppercase letters A-Z.

Performance: We require that the recognition accuracy on the verification code images generated by the model is high (above 70) without reaching the level of over-fitting.

Requirements for software and hardware platforms:

Hardware: GPU-GTX 2060Ti 

Software: python 3.7.9

IDE:pycharm

windows11

Different 10.0

Opencv

Tensorflow

windows10/11

Among them, Opencv: image processing; Tensorflow: deep learning operation library; NVIDIA CUDA deep neural network library (cuDNN) is a GPU-accelerated deep neural network primitive library

3. Basic principles

3.1 Generate images

Call captcha to automatically generate a verification code image. The size of the verification code is 60*100, which contains 4 characters, including numbers and uppercase and lowercase letters.

3.2 Image processing

1. Convert size: Standardize the size and format of the image. Use python to standardize the generated verification code images to keep the size of the images consistent so that the images can smoothly enter the 'black box' of machine learning and generate standard vectors.

2. Convert to grayscale image: Reduce the dimensionality of the three-channel RGB verification code image into a grayscale image.

3. Binarization: Change the grayscale image into a picture with only all black and all white.

4. Dimensionality reduction: Transform the image matrix into a one-dimensional vector.

3.3 Convolutional Neural Network

3.2.1 Composition of convolutional neural network

  Like other neural networks, the CNN network also includes input layer, hidden layer, and output layer. The main operation process of the convolutional neural network is shown in the figure.

Convolution layer: The convolution layer consists of multiple convolution units, and the parameters of each convolution unit are optimized through the back propagation algorithm. The convolution operation is mainly used to extract image features. With the increase of convolution layers, multi-layer networks can extract more complex image features.

Linear rectification: mainly refers to the activation function operation using the ReLu function of linear rectification

Pooling layer: After convolution, the image still has many dimensional features. Divide the feature matrix into several individual blocks and take the maximum or average value, which plays a role in dimensionality reduction.

Fully connected layer: Combine all local features and the feature matrix of each channel into a vector representation, and calculate the final score of each category.

3.2.2 The specific working process of convolutional neural network

(1) Data regularization

The input of a color image is usually decomposed into three RGB channels, with each value ranging from 0 to 255.

(2) Convolution operation (Convolution)

As mentioned earlier, since ordinary neural networks use a fully connected method for feature extraction of input and hidden layers, when processing images, slightly larger images will cause a huge amount of calculation and become very slow. The convolution operation is to solve this problem. Each hidden unit can only connect a part of the input unit. We can understand it as a feature extraction method.

First, let’s clarify a few basic concepts: depth, stride, zero-padding, and convolution kernel.

Depth: Depth refers to the depth of the graph and the depth of its control output unit. It is also expressed as the number of neurons connected to the same area.

Stride: used to describe the step size of the convolution kernel movement.

Zero-padding: Fill the edges of the image with zeros to control the spatial size of the output unit.

Convolution kernel: Each pixel in the output image is a weight function of the weighted average of pixels in a small area in the input image. There can be multiple convolution kernels, and the convolution kernel parameters can be trained through error backpropagation.

As shown in Figure 4-25, the convolution calculation process with step size = 1, the convolution kernel moves to the right in turn to perform the convolution operation to obtain the corresponding result.

For image calculations, the edges can be zero-padded. It can be seen that this process changes the computational size of the image, as shown in the figure above.

The process of convolution operation is actually very simple. The process is described in Figure 4-27 and can be summarized as formula (4.3.6). Among them, B represents the result after convolution, K is the convolution kernel, and A is the input matrix of the image.

As shown in the figure above, it can be seen that the convolution kernel K is a 2*2 convolution kernel. The detailed operation process is as follows.

All image convolution operations can be performed through formulas.

(3)Activate

The CNN convolutional neural network needs to go through an activation process after convolution. The currently commonly used activation function is the Relu function. The main features of the Relu function have been discussed in previous chapters. Judging from the image of the function, unilateral inhibition, a relatively wide excitation boundary, and sparse activation are characteristic.

(4) Pooling

The purpose of pooling is to extract features and reduce the amount of data passed to the next stage. The pooling operation is independent for each depth slice. The pooling size is generally 2*2 of the pixel. Compared with the convolution operation, the pooling layer operation generally has the following types:

Max Pooling: Take the maximum value of 4 point values. This is the most commonly used pooling algorithm.

Mean Pooling: Take the mean of 4 point values.

Gauss Pooling: According to the Gaussian blur method.

As shown in Figure 4-28, the calculation method of maximum pooling is described.

(5) Full connection

The fully connected layer generally appears in the last few steps and plays the role of "classifier" in the convolutional neural network. If operations such as convolutional layers, pooling layers, and activation function layers map the original data to the hidden layer feature space, the fully connected layer plays the role of mapping the learned "distributed feature representation" to the sample label space. . The fully connected process is the process of expanding the matrix. It can also be understood as the convolution operation of the output matrix with a 1*1 convolution kernel, and finally expanded into a 1*n vector.

In convolutional neural networks, the fully connected layer generally uses the Softmax function for classification. The Softmax function is suitable for data classification and is used to ensure that the sum of the probabilities of each classification is 1.

Although the calculation process of the convolutional neural network is tedious to explain, it is very beneficial for a deep understanding of the neural network algorithm. After nearly 30 years of development, convolutional neural networks have multiple branches of network development and continue to develop at a rapid pace. Among them are VGG16 and VGG19 with deepened network layers, NIN network with enhanced convolution module, etc., and new network R-CNN that transitions from classification tasks to target detection tasks. The following figure shows the different development branches of convolutional neural networks. .

4 Scheme design

Because the convolutional neural network CNN has a very good effect on images such as verification codes, and the pictures do not even need too much front-end processing to achieve good results, so we use CNN as a model to identify verification codes.

Here are the reasons why we use CNN:

1. Able to effectively reduce the dimensionality of images with large amounts of data into small amounts of data (without affecting the results)

2. Ability to retain the characteristics of images, similar to human visual principles

3. The principle is easier to understand and can be expanded and transformed.

The following is the general framework of our entire project centered on CNN and the general idea of ​​implementation.

Automatically generate verification code: We are going to use the existing captcha to automatically generate it.

Classify verification codes: Set the proportion of classification reasonably to achieve good results of the model.

Image processing: Based on the conventional process, it undergoes size standardization and image RGB conversion to grayscale, into single channel and other processes, as well as the binary value of grayscale image For chemical processing, we use OPENCV directly.

CNN Model training and hardware considerations: Call the existing tensorflow library and use the existing There are convolutional layers, activation functions, pooling and loss functions, etc. Our team uses RTX2060 for training, and its video memory is only 6GB, so the number of images for each training session during the training process was finally finalized to 300, which occupied exactly 5.7GB of video memory.

Model testing: Directly call the chickpoint file of the trained model, load the specific information of the model through tensorflow internal functions and then identify it. Recognize 100 pictures at a time, and calculate the accuracy and recognition time. Evaluate the model's predictive power and judgment speed.

5 Detailed design

1. Automatically generate and classify verification codes

This is to directly call the library, directly set the parameters of the image and content, and automatically generate a folder for saving the settings.

After the verification code is generated, it is divided into categories and stored in different folders.

2. Verification code image processing

This section performs image preprocessing. In traditional image processing and machine learning algorithms, we often need to perform operations such as image preprocessing, segmentation, cropping, filtering and noise reduction, color separation, and rotation. The use of these methods often requires users to have higher mathematical foundation and programming abilities, and their universality is relatively low. We use the convolutional neural network algorithm (CNN) without doing too much processing on the image. We only need to perform the preprocessing (converting to grayscale image, image binarization) process to realize most static character verification codes. Recognition accuracy is often higher.

3. Convolutional neural network model training

 

We use three layers of convolution and two layers of full connection. The specific information is as follows:

The input image size is 60×100×1.

The first layer of convolution sets 32 filters, each filter has a size of 3×3×1 and a stride of 1. Output 60×160×32.

The first layer of pooling has a size of 2×2 and a stride of 2. Output is 30×80×32.

The second layer of convolution sets 64 filters, each filter has a size of 3×3×32 and a stride of 1. Output is 30×80×64.

The second layer of pooling has a size of 2×2 and a stride of 2. Output is 15×40×64.

The third layer of convolution sets 128 filters, each filter has a size of 3×3×64 and a stride of 1. Output is 15×40×64.

The third layer of pooling has a size of 2×2 and a stride of 2. Output is 8×20×64.

The pooled output is connected to the fully connected layer. The fully connected layer has 1024 neurons. Finally, a fully connected layer is output.

We train the convolution kernel using the gradient descent method:

Our learning rate is set to 0.0001, because considering that a large learning rate will miss the range of the optimal solution, considering thathardware training is faster< a i=2>So we choose a smaller learning rate to trade training time for training effect.

The pooling layer adoptsmaximum poolingmethod, and the window size is: [1, 2, 2, 1].

The activation function is:

We use the following library function to calculate the loss function:

4. Convolutional neural network testing and verification

      

Directly call the chickpoint file of the trained model, load the specific information of the model through tensorflow internal functions and then identify it. Recognize 100 pictures at a time, and calculate the accuracy and recognition time. Evaluate the model's predictive power and judgment speed.

6 System implementation

development environment

Pycharm

Hidden

Python 3.7.9

Different 10.0

Opencv

RTX 2060

Tensorflow 1.15

Code source files and their introduction

change_size.py

Change image size

gen_sample_by_captcha.py

Get sample verification code

t_batch.py

Validate neural network

train_model.py

Train a neural network

verify_and_split_data.py

The sample verification code is divided into training set and test set

captcha_config.json

Get the configuration file of the sample verification code

sample_config.json

Related training parameter configuration files

network.py

CNN neural network module

The calling relationship between codes is shown in the figure below:

The structural relationship between codes is mainly written using the existing library.

7 Test and evaluation results

At first, our team trained the lowercase letter scheme and found that a training set of 50,000 images was enough to achieve good results. However, when uppercase letters are added, the accuracy rate is only about 50 in the same training set, so we added an additional 50,000 training sets for training to achieve the same effect as the lowercase model. It can be seen that predictions with more complex features require a higher number of training sets.

In the test set, we divided it into two methods: containing capital letters and not containing capital letters. We tested 100 pictures in batches respectively. In the case of containing capital letters, the accuracy rate was 70/100; in the case of not containing capital letters, the accuracy rate was 70/100. , the accuracy rate is 73/100. Overall, the test results are good and the accuracy is high.

Partial image verification:

Group 1: qgnx

Group 2: luwz

Group 3: cnkg

However, using a crawler program to identify verification codes on the Internet is not very effective. We analyzed the reason: the data set needs to be re-searched (collecting various types from the Internet).

8 Summary

In this course design, we feel that the biggest difficulty we encountered is to configure the environment and parameter adjustments for our machine learning CNN model code. During this process, we did not consider the update settings of the tensorflow package at the beginning. The previous functions cannot run properly in the current tensorflow 2.0 version, so we searched for information online to find solutions. In terms of parameter adjustment, we set the convolution kernel size and achieved better results after several tests.

At the beginning, tensorflow-cpu was too slow, so we looked to add the GPU part to improve the speed of model training. In order to adapt and use the computer's GPU, we downloaded and learned CUDA and Cudnn to solve the problem.

A lot of effort has also been spent on deeply understanding the principles of neural networks, comparing different neural network models, and comparing and analyzing their characteristics.

There is still room for improvement in identifying verification codes on the Internet. Because there are various verification codes on the Internet, the model must be retrained on the various verification codes on the Internet to re-obtain the feature matrix. , in order to achieve better results

9.Project code

1.train.py

import json

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import time
from PIL import Image
import random
import os
from cnnlib.network import CNN


class TrainError(Exception):
    pass


class TrainModel(CNN):
    def __init__(self, train_img_path, verify_img_path, char_set, model_save_dir, cycle_stop, acc_stop, cycle_save,
                 image_suffix, train_batch_size, test_batch_size, verify=False):
        # 训练相关参数
        self.cycle_stop = cycle_stop
        self.acc_stop = acc_stop
        self.cycle_save = cycle_save
        self.train_batch_size = train_batch_size
        self.test_batch_size = test_batch_size

        self.image_suffix = image_suffix
        char_set = [str(i) for i in char_set]

        # 打乱文件顺序+校验图片格式
        self.train_img_path = train_img_path
        self.train_images_list = os.listdir(train_img_path)
        # 校验格式
        if verify:
            self.confirm_image_suffix()
        # 打乱文件顺序
        random.seed(time.time())
        random.shuffle(self.train_images_list)

        # 验证集文件
        self.verify_img_path = verify_img_path
        self.verify_images_list = os.listdir(verify_img_path)

        # 获得图片宽高和字符长度基本信息
        label, captcha_array = self.gen_captcha_text_image(train_img_path, self.train_images_list[0])

        captcha_shape = captcha_array.shape
        captcha_shape_len = len(captcha_shape)
        if captcha_shape_len == 3:
            image_height, image_width, channel = captcha_shape
            self.channel = channel
        elif captcha_shape_len == 2:
            image_height, image_width = captcha_shape
        else:
            raise TrainError("图片转换为矩阵时出错,请检查图片格式")

        # 初始化变量
        super(TrainModel, self).__init__(image_height, image_width, len(label), char_set, model_save_dir)

        # 相关信息打印
        print("-->图片尺寸: {} X {}".format(image_height, image_width))
        print("-->验证码长度: {}".format(self.max_captcha))
        print("-->验证码共{}类 {}".format(self.char_set_len, char_set))
        print("-->使用测试集为 {}".format(train_img_path))
        print("-->使验证集为 {}".format(verify_img_path))

        # test model input and output
        print(">>> Start model test")
        batch_x, batch_y = self.get_batch(0, size=100)
        print(">>> input batch images shape: {}".format(batch_x.shape))
        print(">>> input batch labels shape: {}".format(batch_y.shape))

    @staticmethod
    def gen_captcha_text_image(img_path, img_name):
        """
        返回一个验证码的array形式和对应的字符串标签
        :return:tuple (str, numpy.array)
        """
        # 标签
        label = img_name.split("_")[0]
        # 文件
        img_file = os.path.join(img_path, img_name)
        captcha_image = Image.open(img_file)
        captcha_array = np.array(captcha_image)  # 向量化
        return label, captcha_array

    def get_batch(self, n, size=300):
        batch_x = np.zeros([size, self.image_height * self.image_width])  # 初始化
        batch_y = np.zeros([size, self.max_captcha * self.char_set_len])  # 初始化

        max_batch = int(len(self.train_images_list) / size)
        # print(max_batch)
        if max_batch - 1 < 0:
            raise TrainError("训练集图片数量需要大于每批次训练的图片数量")
        if n > max_batch - 1:
            n = n % max_batch
        s = n * size
        e = (n + 1) * size
        this_batch = self.train_images_list[s:e]
        # print("{}:{}".format(s, e))

        for i, img_name in enumerate(this_batch):
            label, image_array = self.gen_captcha_text_image(self.train_img_path, img_name)
            image_array = self.convert2gray(image_array)  # 灰度化图片
            batch_x[i, :] = image_array.flatten() / 255  # flatten 转为一维
            batch_y[i, :] = self.text2vec(label)  # 生成 oneHot
        return batch_x, batch_y

    def get_verify_batch(self, size=100):
        batch_x = np.zeros([size, self.image_height * self.image_width])  # 初始化
        batch_y = np.zeros([size, self.max_captcha * self.char_set_len])  # 初始化

        verify_images = []
        for i in range(size):
            verify_images.append(random.choice(self.verify_images_list))

        for i, img_name in enumerate(verify_images):
            label, image_array = self.gen_captcha_text_image(self.verify_img_path, img_name)
            image_array = self.convert2gray(image_array)  # 灰度化图片
            batch_x[i, :] = image_array.flatten() / 255  # flatten 转为一维
            batch_y[i, :] = self.text2vec(label)  # 生成 oneHot
        return batch_x, batch_y

    def confirm_image_suffix(self):
        # 在训练前校验所有文件格式
        print("开始校验所有图片后缀")
        for index, img_name in enumerate(self.train_images_list):
            print("{} image pass".format(index), end='\r')
            if not img_name.endswith(self.image_suffix):
                raise TrainError('confirm images suffix:you request [.{}] file but get file [{}]'
                                 .format(self.image_suffix, img_name))
        print("所有图片格式校验通过")

    def train_cnn(self):
        y_predict = self.model()
        print(">>> input batch predict shape: {}".format(y_predict.shape))
        print(">>> End model test")
        # 计算概率 损失
        with tf.name_scope('cost'):
            cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y_predict, labels=self.Y))
        # 梯度下降
        with tf.name_scope('train'):
            optimizer = tf.train.AdamOptimizer(learning_rate=0.0001).minimize(cost)
        # 计算准确率
        predict = tf.reshape(y_predict, [-1, self.max_captcha, self.char_set_len])  # 预测结果
        max_idx_p = tf.argmax(predict, 2)  # 预测结果
        max_idx_l = tf.argmax(tf.reshape(self.Y, [-1, self.max_captcha, self.char_set_len]), 2)  # 标签
        # 计算准确率
        correct_pred = tf.equal(max_idx_p, max_idx_l)
        with tf.name_scope('char_acc'):
            accuracy_char_count = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
        with tf.name_scope('image_acc'):
            accuracy_image_count = tf.reduce_mean(tf.reduce_min(tf.cast(correct_pred, tf.float32), axis=1))
        # 模型保存对象
        saver = tf.train.Saver()
        with tf.Session() as sess:
            init = tf.global_variables_initializer()
            sess.run(init)
            # 恢复模型
            if os.path.exists(self.model_save_dir):
                try:
                    saver.restore(sess, self.model_save_dir)
                # 判断捕获model文件夹中没有模型文件的错误
                except ValueError:
                    print("model文件夹为空,将创建新模型")
            else:
                pass
            # 写入日志
            tf.summary.FileWriter("logs/", sess.graph)

            step = 1
            for i in range(self.cycle_stop):
                batch_x, batch_y = self.get_batch(i, size=self.train_batch_size)
                # 梯度下降训练
                _, cost_ = sess.run([optimizer, cost],
                                    feed_dict={self.X: batch_x, self.Y: batch_y, self.keep_prob: 0.6})
                if step % 10 == 0:
                    # 基于训练集的测试
                    batch_x_test, batch_y_test = self.get_batch(i, size=self.train_batch_size)
                    acc_char = sess.run(accuracy_char_count, feed_dict={self.X: batch_x_test, self.Y: batch_y_test, self.keep_prob: 1.})
                    acc_image = sess.run(accuracy_image_count, feed_dict={self.X: batch_x_test, self.Y: batch_y_test, self.keep_prob: 1.})
                    print("第{}次训练 >>> ".format(step))
                    print("[训练集] 字符准确率为 {:.5f} 图片准确率为 {:.5f} >>> loss {:.10f}".format(acc_char, acc_image, cost_))

                    # with open("loss_train.csv", "a+") as f:
                    #     f.write("{},{},{},{}\n".format(step, acc_char, acc_image, cost_))

                    # 基于验证集的测试
                    batch_x_verify, batch_y_verify = self.get_verify_batch(size=self.test_batch_size)
                    acc_char = sess.run(accuracy_char_count, feed_dict={self.X: batch_x_verify, self.Y: batch_y_verify, self.keep_prob: 1.})
                    acc_image = sess.run(accuracy_image_count, feed_dict={self.X: batch_x_verify, self.Y: batch_y_verify, self.keep_prob: 1.})
                    print("[验证集] 字符准确率为 {:.5f} 图片准确率为 {:.5f} >>> loss {:.10f}".format(acc_char, acc_image, cost_))

                    # with open("loss_test.csv", "a+") as f:
                    #     f.write("{}, {},{},{}\n".format(step, acc_char, acc_image, cost_))

                    # 准确率达到99%后保存并停止
                    if acc_image > self.acc_stop:
                        saver.save(sess, self.model_save_dir)
                        print("验证集准确率达到99%,保存模型成功")
                        break
                # 每训练500轮就保存一次
                if i % self.cycle_save == 0:
                    saver.save(sess, self.model_save_dir)
                    print("定时保存模型成功")
                step += 1
            saver.save(sess, self.model_save_dir)

    def recognize_captcha(self):
        label, captcha_array = self.gen_captcha_text_image(self.train_img_path, random.choice(self.train_images_list))

        f = plt.figure()
        ax = f.add_subplot(111)
        ax.text(0.1, 0.9, "origin:" + label, ha='center', va='center', transform=ax.transAxes)
        plt.imshow(captcha_array)
        # 预测图片
        image = self.convert2gray(captcha_array)
        image = image.flatten() / 255

        y_predict = self.model()

        saver = tf.train.Saver()
        with tf.Session() as sess:
            saver.restore(sess, self.model_save_dir)
            predict = tf.argmax(tf.reshape(y_predict, [-1, self.max_captcha, self.char_set_len]), 2)
            text_list = sess.run(predict, feed_dict={self.X: [image], self.keep_prob: 1.})
            predict_text = text_list[0].tolist()

        print("正确: {}  预测: {}".format(label, predict_text))
        # 显示图片和预测结果
        p_text = ""
        for p in predict_text:
            p_text += str(self.char_set[p])
        print(p_text)
        plt.text(20, 1, 'predict:{}'.format(p_text))
        plt.show()


def main():
    with open("conf/sample_config.json", "r") as f:
        sample_conf = json.load(f)

    train_image_dir = sample_conf["train_image_dir"]
    verify_image_dir = sample_conf["test_image_dir"]
    model_save_dir = sample_conf["model_save_dir"]
    cycle_stop = sample_conf["cycle_stop"]
    acc_stop = sample_conf["acc_stop"]
    cycle_save = sample_conf["cycle_save"]
    enable_gpu = sample_conf["enable_gpu"]
    image_suffix = sample_conf['image_suffix']
    use_labels_json_file = sample_conf['use_labels_json_file']
    train_batch_size = sample_conf['train_batch_size']
    test_batch_size = sample_conf['test_batch_size']

    if use_labels_json_file:
        with open("tools/labels.json", "r") as f:
            char_set = f.read().strip()
    else:
        char_set = sample_conf["char_set"]

    if not enable_gpu:
        # 设置以下环境变量可开启CPU识别
        os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
        os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

    tm = TrainModel(train_image_dir, verify_image_dir, char_set, model_save_dir, cycle_stop, acc_stop, cycle_save,
                    image_suffix, train_batch_size, test_batch_size, verify=False)
    tm.train_cnn()  # 开始训练模型
    # tm.recognize_captcha()  # 识别图片示例


if __name__ == '__main__':
    main()

2. Get the remaining codes through private chat with bloggers (Follow first)

See the beginning of the article for reply time

Guess you like

Origin blog.csdn.net/weixin_53284122/article/details/124073487