Verification code rotation printing text recognition actual combat
The text mainly describes the recognition training of rotating printed text verification codes. The suitable readers are those who have experience in recognizing verification codes or those who are currently trying to find verification codes. This paper uses CRNN+CTC to recognize text, and the recognition rate of the sample data has reached 97% through training. This article mainly describes the entire project process in the form of actual combat code display, which is convenient for everyone to understand and try.
Introduction to CRNN model
Chinese translation: CRNN paper translation-Chinese version
CRNN basic network structure
Network Architecture. The architecture consists of three parts:
- Convolutional layer, extract feature sequences from the input image, the convolutional layer is an ordinary CNN network, used to extract the Convolutional feature maps of the input image;
- The loop layer predicts the label distribution of each frame. The loop network layer is a deep two-way LSTM network, which continues to extract text sequence features on the basis of convolutional features;
- The transcription layer converts the prediction of each frame into the final label sequence, and after the RNN output is made as a softmax, it is output as a character.
At the bottom of the CRNN, the convolutional layer automatically extracts the feature sequence from each input image. On top of the convolutional network, a recurrent network is constructed to predict each frame of the feature sequence output by the convolutional layer. The transcription layer on top of CRNN is used to convert the prediction of each frame of the loop layer into a label sequence. Although CRNN is composed of different types of network architectures (such as CNN and RNN), it can be jointly trained through a loss function.
pytorch implements network model code
class BidirectionalLSTM(nn.Module):
def __init__(self, nIn, nHidden, nOut):
super(BidirectionalLSTM, self).__init__()
self.rnn = nn.LSTM(nIn, nHidden, bidirectional=True)
self.embedding = nn.Linear(nHidden * 2, nOut)
def forward(self, input):
recurrent, _ = self.rnn(input)
T, b, h = recurrent.size()
t_rec = recurrent.view(T * b, h)
output = self.embedding(t_rec) # [T * b, nOut]
output = output.view(T, b, -1)
return output
class CRNN(nn.Module):
def __init__(self, imgH, nc, nclass, nh, n_rnn=2, leakyRelu=False):
super(CRNN, self).__init__()
assert imgH % 16 == 0, 'imgH has to be a multiple of 16'
ks = [3, 3, 3, 3, 3, 3, 2]
ps = [1, 1, 1, 1, 1, 1, 0]
ss = [1, 1, 1, 1, 1, 1, 1]
nm = [64, 128, 256, 256, 512, 512, 512]
cnn = nn.Sequential()
def convRelu(i, batchNormalization=False):
nIn = nc if i == 0 else nm[i - 1]
nOut = nm[i]
cnn.add_module('conv{0}'.format(i), nn.Conv2d(nIn, nOut, ks[i], ss[i], ps[i]))
if batchNormalization:
cnn.add_module('batchnorm{0}'.format(i), nn.BatchNorm2d(nOut))
if leakyRelu:
cnn.add_module('relu{0}'.format(i), nn.LeakyReLU(0.2, inplace=True))
else:
cnn.add_module('relu{0}'.format(i), nn.ReLU(True))
convRelu(0)
cnn.add_module('pooling{0}'.format(0), nn.MaxPool2d(2, 2)) # 64x16x64
convRelu(1)
cnn.add_module('pooling{0}'.format(1), nn.MaxPool2d(2, 2)) # 128x8x32
convRelu(2, True)
convRelu(3)
cnn.add_module('pooling{0}'.format(2), nn.MaxPool2d((2, 2), (2, 1), (0, 1))) # 256x4x16
convRelu(4, True)
convRelu(5)
cnn.add_module('pooling{0}'.format(3), nn.MaxPool2d((2, 2), (2, 1), (0, 1))) # 512x2x16
convRelu(6, True) # 512x1x16
self.cnn = cnn
self.rnn = nn.Sequential(BidirectionalLSTM(512, nh, nh), BidirectionalLSTM(nh, nh, nclass))
def forward(self, input):
conv = self.cnn(input)
b, c, h, w = conv.size()
assert h == 1, "the height of conv must be 1"
conv = conv.squeeze(2)
conv = conv.permute(2, 0, 1)
output = self.rnn(conv)
output = F.log_softmax(output, dim=2)
return output
def backward_hook(self, module, grad_input, grad_output):
for g in grad_input:
g[g != g] = 0
Model training related (code submitted to GitHub)
Code warehouse
https://github.com/CaoYuGang/crnn_word_captcha
data set
This project has 82W training sets and 1.9W test sets. The data set naming method adopts "text content_data value.jpg", which covers 2033 texts.
training
# 生成用于训练的mdb文件--out文件输出路径, --folder数据集存放路径, 用此指令生成训练集文件和测试集文件
python tool/create_dataset.py --out data/train --folder path/to/folder
# 用于生成所有文字的映射文件,--trainfolder训练集路径,--testfolder测试集路径
python tool/select_words.py --trainfolder /path/train --testfolder path/test
# 训练模型--trainroot用于训练的mdb路径,--valroot用于测试的mdb训练路径
python train.py --trainroot data/train --valroot data/test
Training part parameter disassembly description
cuda = True # 是否使用GPU训练
multi_gpu = False # 是否使用多GPU进行训练
ngpu = 1 # 多GPU训练时的gpu数量
workers = 0 # 加载数据的worker,在机器允许的情况下设为8或16,降低CPU调度,提高训练效率
# training process
displayInterval = 100 # 每隔多少次训练展示损失率
valInterval = 1000 # 每隔多少次训练测试模型
saveInterval = 10 # 每隔多少次循环保存模型
nepoch = 1000 # 数据集最大循环训练次数
About the data set
Welcome to join the group, plus group welfare provides a complete model test, the complete data set can contact the group owner.
Project statement
This project provides a practical technique for rotating text verification codes. The project is for personal research only, please do not conduct commercial operations or attack websites.