Experiential text selection-text label recognition CRNN project actual combat sharing

Verification code rotation printing text recognition actual combat

The text mainly describes the recognition training of rotating printed text verification codes. The suitable readers are those who have experience in recognizing verification codes or those who are currently trying to find verification codes. This paper uses CRNN+CTC to recognize text, and the recognition rate of the sample data has reached 97% through training. This article mainly describes the entire project process in the form of actual combat code display, which is convenient for everyone to understand and try.

Insert picture description here

Introduction to CRNN model

模型论文:An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Chinese translation: CRNN paper translation-Chinese version

CRNN basic network structure

Insert picture description here
Network Architecture. The architecture consists of three parts:

  1. Convolutional layer, extract feature sequences from the input image, the convolutional layer is an ordinary CNN network, used to extract the Convolutional feature maps of the input image;
  2. The loop layer predicts the label distribution of each frame. The loop network layer is a deep two-way LSTM network, which continues to extract text sequence features on the basis of convolutional features;
  3. The transcription layer converts the prediction of each frame into the final label sequence, and after the RNN output is made as a softmax, it is output as a character.

At the bottom of the CRNN, the convolutional layer automatically extracts the feature sequence from each input image. On top of the convolutional network, a recurrent network is constructed to predict each frame of the feature sequence output by the convolutional layer. The transcription layer on top of CRNN is used to convert the prediction of each frame of the loop layer into a label sequence. Although CRNN is composed of different types of network architectures (such as CNN and RNN), it can be jointly trained through a loss function.

pytorch implements network model code

class BidirectionalLSTM(nn.Module):
    def __init__(self, nIn, nHidden, nOut):
        super(BidirectionalLSTM, self).__init__()
        self.rnn = nn.LSTM(nIn, nHidden, bidirectional=True)
        self.embedding = nn.Linear(nHidden * 2, nOut)

    def forward(self, input):
        recurrent, _ = self.rnn(input)
        T, b, h = recurrent.size()
        t_rec = recurrent.view(T * b, h)
        output = self.embedding(t_rec)  # [T * b, nOut]
        output = output.view(T, b, -1)

        return output

class CRNN(nn.Module):
    def __init__(self, imgH, nc, nclass, nh, n_rnn=2, leakyRelu=False):
        super(CRNN, self).__init__()
        assert imgH % 16 == 0, 'imgH has to be a multiple of 16'
        ks = [3, 3, 3, 3, 3, 3, 2]
        ps = [1, 1, 1, 1, 1, 1, 0]
        ss = [1, 1, 1, 1, 1, 1, 1]
        nm = [64, 128, 256, 256, 512, 512, 512]
        cnn = nn.Sequential()

        def convRelu(i, batchNormalization=False):
            nIn = nc if i == 0 else nm[i - 1]
            nOut = nm[i]
            cnn.add_module('conv{0}'.format(i), nn.Conv2d(nIn, nOut, ks[i], ss[i], ps[i]))
            if batchNormalization:
                cnn.add_module('batchnorm{0}'.format(i), nn.BatchNorm2d(nOut))
            if leakyRelu:
                cnn.add_module('relu{0}'.format(i), nn.LeakyReLU(0.2, inplace=True))
            else:
                cnn.add_module('relu{0}'.format(i), nn.ReLU(True))

        convRelu(0)
        cnn.add_module('pooling{0}'.format(0), nn.MaxPool2d(2, 2))  # 64x16x64
        convRelu(1)
        cnn.add_module('pooling{0}'.format(1), nn.MaxPool2d(2, 2))  # 128x8x32
        convRelu(2, True)
        convRelu(3)
        cnn.add_module('pooling{0}'.format(2), nn.MaxPool2d((2, 2), (2, 1), (0, 1)))  # 256x4x16
        convRelu(4, True)
        convRelu(5)
        cnn.add_module('pooling{0}'.format(3), nn.MaxPool2d((2, 2), (2, 1), (0, 1)))  # 512x2x16
        convRelu(6, True)  # 512x1x16

        self.cnn = cnn
        self.rnn = nn.Sequential(BidirectionalLSTM(512, nh, nh), BidirectionalLSTM(nh, nh, nclass))

    def forward(self, input):
        conv = self.cnn(input)
        b, c, h, w = conv.size()
        assert h == 1, "the height of conv must be 1"
        conv = conv.squeeze(2)
        conv = conv.permute(2, 0, 1)
        output = self.rnn(conv)
        output = F.log_softmax(output, dim=2)
        return output

    def backward_hook(self, module, grad_input, grad_output):
        for g in grad_input:
            g[g != g] = 0

Model training related (code submitted to GitHub)

Code warehouse

https://github.com/CaoYuGang/crnn_word_captcha

data set

This project has 82W training sets and 1.9W test sets. The data set naming method adopts "text content_data value.jpg", which covers 2033 texts.
Insert picture description here

training

# 生成用于训练的mdb文件--out文件输出路径, --folder数据集存放路径, 用此指令生成训练集文件和测试集文件
python tool/create_dataset.py --out data/train --folder path/to/folder
# 用于生成所有文字的映射文件,--trainfolder训练集路径,--testfolder测试集路径
python tool/select_words.py --trainfolder /path/train --testfolder path/test
# 训练模型--trainroot用于训练的mdb路径,--valroot用于测试的mdb训练路径
python train.py --trainroot data/train --valroot data/test

Training part parameter disassembly description

cuda = True # 是否使用GPU训练
multi_gpu = False # 是否使用多GPU进行训练
ngpu = 1 # 多GPU训练时的gpu数量
workers = 0 # 加载数据的worker,在机器允许的情况下设为8或16,降低CPU调度,提高训练效率

# training process
displayInterval = 100 # 每隔多少次训练展示损失率
valInterval = 1000 # 每隔多少次训练测试模型
saveInterval = 10 # 每隔多少次循环保存模型

nepoch = 1000 # 数据集最大循环训练次数

About the data set

Insert picture description here

Welcome to join the group, plus group welfare provides a complete model test, the complete data set can contact the group owner.

Project statement

This project provides a practical technique for rotating text verification codes. The project is for personal research only, please do not conduct commercial operations or attack websites.

Guess you like

Origin blog.csdn.net/caoyugangsg/article/details/109263061