[Deep Learning] Captcha Recognition Based on Convolutional Neural Network

Event address: CSDN 21-day Learning Challenge

foreword

The environment will not be repeated here. It is consistent with the environment in the [Deep Learning] Weather Recognition Training Text Based on Convolutional Neural Networks. If the configuration is still unsuccessful, please see the detailed package configuration at the end of this article.

Understanding the captcha dataset

This contains 1070 handwritten verification code pictures. And use the normal verification code as the name of the picture. So in the later stage, you need to manually split the test set and the validation set, and you need to manually extract the verification codes in all the image names.
insert image description here

Download the weather_photos dataset

You can send me a private message (because the dataset has already been uploaded to csdn, so it cannot be uploaded again)

Whether to use CPU training or GPU training

Generally speaking, if you have a good graphics card (GPU), use the GPU for training because it is fast , so you need to download the tensorflow-gpu package accordingly. If your graphics card is poor or you don't have enough funds to start with a good graphics card, you can use CUP training.

the difference

(1) The CPU is mainly used for serial operations; while the GPU is for massively parallel operations. Due to the huge amount of samples and the large amount of parameters in deep learning, the role of GPU is to accelerate network operations.

(2) It is also possible to calculate the neural network by the CPU, and the calculated neural network is also very effective in practical applications, but the speed will be very slow. At present, GPU operations mainly focus on matrix multiplication and convolution, and other logical operations are not as fast as CPUs.

Train with CPU

# 使用cpu训练
import os

os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

CPU model is not displayed when training with CPU.
insert image description here

Training with GPU

gpus = tf.config.list_physical_devices("GPU")

if gpus:
    gpu0 = gpus[0]  # 如果有多个GPU,仅使用第0个GPU
    tf.config.experimental.set_memory_growth(gpu0, True)  # 设置GPU显存用量按需使用
    tf.config.set_visible_devices([gpu0], "GPU")

When using GPU training, the corresponding GPU model will be displayed.
insert image description here

Support Chinese

Use import matplotlib.pyplot as pltimport library, . plt is a library that stands for draw. The configuration in the library is fixed, but sometimes we want to modify the configuration parameters of plt to meet the drawing needs.
Can be plt.rcParams['配置参数']=[修改值]modified, rcParams (run configuration parameters) run configuration parameters.

plt.rcParams['font.sans-serif'] = ['SimHei'] #运行配置参数中的字体(font)为黑体(SimHei)
plt.rcParams['axes.unicode_minus'] = False #运行配置参数总的轴(axes)正常显示正负号(minus)

Import Data

Here we see that the random seed of numpy and the random seed of tf are set to fixed values, so that the training results are as stable as possible. Here, the path where the dataset is stored locally is given to the data_dir variable.

import matplotlib.pyplot as plt
import PIL

# 设置随机种子尽可能使结果可以重现
import numpy as np

np.random.seed(1)

# 设置随机种子尽可能使结果可以重现
tf.random.set_seed(1)

from tensorflow import keras
from tensorflow.keras import layers, models

import pathlib

data_dir = "E:\\PythonProject\\day6\\data\\captcha\\"
data_dir = pathlib.Path(data_dir)

# 提取所有照片的路径
all_image_paths = list(data_dir.glob('*'))
all_image_paths = [str(path) for path in all_image_paths]

# 打乱数据  因为文件默认按照文件名的字母排序,所以需要打乱顺序
random.shuffle(all_image_paths)

# 获取数据标签  通过拆分图片名称的后缀见所有的验证码字符串提取出来
# 验证码长度是5位,且都已.png结尾 然后进行拆分
all_label_names = [path.split("\\")[5].split(".")[0] for path in all_image_paths]

View data volume

image_count = len(all_image_paths)
print("图片总数为:", image_count)

Show some pictures

Draw the first 20 sheets, 5 in each row for a total of four lines.

from matplotlib import pyplot as plt

plt.figure(figsize=(10, 5))

for i in range(20):
    plt.subplot(4, 5, i + 1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)

    # 显示图片
    images = plt.imread(all_image_paths[i])
    plt.imshow(images)
    # 显示标签
    plt.xlabel(all_label_names[i])

plt.show()

Plot the result:
insert image description here

preprocessing

Manually set labels

Design an array to store all numbers + characters that appear in the verification code.

number = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
            'v', 'w', 'x', 'y', 'z']
char_set = number + alphabet
char_set_len = len(char_set)
label_name_len = len(all_label_names[0])

def text2vec(text):
    vector = np.zeros([label_name_len, char_set_len])
    for i, c in enumerate(text):
        idx = char_set.index(c)
        vector[i][idx] = 1.0
    return vector


all_labels = [text2vec(i) for i in all_label_names]

It may be a bit confusing to look at the code directly. The following picture is the final effect, which is equivalent to a three-dimensional array. The first dimension represents each picture, the second dimension represents multiple character sets, and the third dimension represents what the verification code of each person should be. Characters, if there is, it is 0 in the corresponding position, if not, it is 0.
insert image description here

Grayscale processing

Grayscale, in the RGB model, if R=G=B, the color represents a grayscale color, and the value of R=G=B is called the grayscale value, so each pixel of the grayscale image only needs one Bytes store grayscale values ​​(also known as intensity values, brightness values), and the grayscale range is 0-255. The formula is as follows:
insert image description here

Average method

A grayscale image is obtained by averaging the three-component luminances in the color image. This article uses the photos processed by the average method.
insert image description here
The figure below is grayscale by the averaging method. The left is the original image, and the right is the grayscaled image.
insert image description here

Weighted average method

This method is a method of estimating the likely direction of this value in the future based on the observed value over a certain period of time in the past.
insert image description here
The figure below is grayscale by the averaging method. The left is the original image, and the right is the grayscaled image.
insert image description here

cvtColor

The API cvtColor function of OpenCV can also achieve grayscale processing. The figure below is grayscale by the averaging method. The left is the original image, and the right is the grayscaled image.
insert image description here

Download Data

The from_tensor_slices method is used here. This function is one of the core functions of dataset. Its function is to slice the given data such as tuples, lists and tensors. The extent of the slice starts from the outermost dimension. If there are multiple features to be combined, then a slice is to cut the data of the outermost dimension of each combination and divide it into groups.

AUTOTUNE = tf.data.experimental.AUTOTUNE

path_ds  = tf.data.Dataset.from_tensor_slices(all_image_paths)
image_ds = path_ds.map(load_and_preprocess_image, num_parallel_calls=AUTOTUNE)
label_ds = tf.data.Dataset.from_tensor_slices(all_labels)

image_label_ds = tf.data.Dataset.zip((image_ds, label_ds))

# 拆分数据集 将前1000个作为训练集 剩余的作为测试集
train_ds = image_label_ds.take(1000) 
val_ds   = image_label_ds.skip(1000)  

Configure the dataset (speed up)

shuffle(): This function randomly sorts all elements of the list. Sometimes our tasks will randomly sample certain numbers of a data set. For example, there are 10 lines in a text, and we need to randomly select the first 5.
prefetch(): prefetch is the content of prefetching memory, the programmer tells the CPU what content may be used immediately, and the CPU prefetches for optimization.

BATCH_SIZE = 16
train_ds = train_ds.batch(BATCH_SIZE)
train_ds = train_ds.prefetch(buffer_size=AUTOTUNE)

val_ds = val_ds.batch(BATCH_SIZE)
val_ds = val_ds.prefetch(buffer_size=AUTOTUNE)

Build a CNN model

The model here is roughly the same as the previous ones, so I won't introduce it too much.

from tensorflow.keras import datasets, layers, models

model = models.Sequential([
    
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(50, 200, 1)),
    layers.MaxPooling2D((2, 2)),                   
    layers.Conv2D(64, (3, 3), activation='relu'),  
    layers.MaxPooling2D((2, 2)),                   
    
    layers.Flatten(),                              
    layers.Dense(1000, activation='relu'),         
    
    layers.Dense(label_name_len * char_set_len),
    layers.Reshape([label_name_len, char_set_len]),
    layers.Softmax()                               
])

model.summary()  # 打印网络结构

network structure

A total of 10 layers including the input layer
insert image description here

parameter quantity

The total number of parameters is 33M, and the amount of parameters is larger, but the data set is not very large. GPU training is recommended.

Total params: 33,991,996
Trainable params: 33,991,996
Non-trainable params: 0

Train the model

Train the model for 10 epochs.

# 设置优化器
model.compile(optimizer="adam",
              loss='categorical_crossentropy',
              metrics=['accuracy'])

epochs = 10

history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=epochs
)

Training results: After 10 rounds, the correct rate of the test set is only 78.57%, which shows that there is still a lot of room for optimization.
insert image description here

Model evaluation

The data of the trained model is made into a curve table, which is convenient for the optimization of the model later, whether it is overfitting or underfitting or need to expand the data and so on.

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs_range = range(epochs)

plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)

plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

operation result:
insert image description here

predict

Here we perform a pre-processing effect on the trained model. The following six pictures are used as an example for testing.

def vec2text(vec):
    text = []
    for i, c in enumerate(vec):
        text.append(char_set[c])
    return "".join(text)


plt.figure(figsize=(8, 8))

for images, labels in val_ds.take(1):
    for i in range(6):
        ax = plt.subplot(5, 2, i + 1)

        # 显示图片
        image = tf.reshape(images, [16, 50, 200])
        plt.imshow(image[i])

        # 需要给图片增加一个维度
        img_array = tf.expand_dims(images[i], 0)

        # 使用模型预测验证码
        predictions = model.predict(img_array)
        plt.title(vec2text(np.argmax(predictions, axis=2)[0]))

        plt.axis("off")

plt.show()

It can be seen that the error rate is still quite high, and the model needs to be further improved.
insert image description here

Guess you like

Origin blog.csdn.net/qq_45254369/article/details/126329361