TensorFlow used to implement a simple identification code verification process

In this paper we use TensorFlow to achieve a deep learning model used to implement the process of verification code identification, where identification verification code is CAPTCHA, first of all we will use the data to train a good label model, and then use the model to achieve the verification of the identification code.

1. verification code ready

Here we use python library to generate a captcha to this library is not installed by default, so here we need to install the library, and we also need to install the pillow library

1

After installing, we can come to generate a simple graphical codes with the following code

1

1

Can be seen in the figure is the content of the text we have defined, so that we can get a real picture and its corresponding text, then we can use it to generate a number of training data and the test data.

2. Pretreatment

Before training is certainly to data preprocessing, and now we first defined the verification code text to be generated, which is equivalent to already have a label, and then we re-use it to generate verification codes, you can get input x data, and here we first defined the input of our vocabulary, because the uppercase and lowercase letters plus numbers of relatively large vocabulary, imagine that we use this code contains uppercase and lowercase letters and numbers of a four-character code verification, a total of possible combinations is (26 + 26 + 10) ^ 4 = 14,776,336 kinds of combinations, the number of training up a bit big, so here we streamlined look, using only pure digital code to train, so the number of combinations becomes 10 ^ 4 = 10000 species, apparently a lot less.

So here we define a vocabulary and its length variables:

1

Here VOCAB vocabularies is the content, i.e., 0 to 9 of these 10 figures, i.e., character code verification CAPTCHA_LENGTH number is 4, length is a length VOCAB vocabulary, i.e., 10.

Next we define a method for generating a verification code data, the process similar to the above, but here we will return the data into the form of an array Numpy:

1

So call this method, we can get a Numpy array, this in fact is converted into the code of each pixel RGB, what we call this method to try:

1

It reads as follows:

1

可以看到它的 shape 是 (60, 160, 3),这其实代表验证码图片的高度是 60,宽度是 160,是 60 x 160 像素的验证码,每个像素都有 RGB 值,所以最后一维即为像素的 RGB 值。

接下来我们需要定义 label,由于我们需要使用深度学习模型进行训练,所以这里我们的 label 数据最好使用 One-Hot 编码,即如果验证码文本是 1234,那么应该词表索引位置置 1,总共的长度是 40,我们用程序实现一下 One-Hot 编码和文本的互相转换:

1

这里 text2vec() 方法就是将真实文本转化为 One-Hot 编码,vec2text() 方法就是将 One-Hot 编码转回真实文本。

例如这里调用一下这两个方法,我们将 1234 文本转换为 One-Hot 编码,然后在将其转回来:

1

这样我们就可以实现文本到 One-Hot 编码的互转了。

接下来我们就可以构造一批数据了,x 数据就是验证码的 Numpy 数组,y 数据就是验证码的文本的 One-Hot 编码,生成内容如下:

1

1

这里我们定义了一个 getrandomtext() 方法,可以随机生成验证码文本,然后接下来再利用这个随机生成的文本来产生对应的 x、y 数据,然后我们再将数据写入到 pickle 文件里,这样就完成了预处理的操作。

3.构建模型

有了数据之后,我们就开始构建模型吧,这里我们还是利用 traintestsplit() 方法将数据分为三部分,训练集、开发集、验证集:

1

接下来我们使用者三个数据集构建三个 Dataset 对象:

1

然后初始化一个迭代器,并绑定到这个数据集上:

1

接下来就是关键的部分了,在这里我们使用三层卷积和两层全连接网络进行构造,在这里为了简化写法,直接使用 TensorFlow 的 layers 模块:

1

这里卷积核大小为 3,padding 使用 SAME 模式,激活函数使用 relu。


After transformation fully connected network, y becomes the shape [batchsize, nclasses], our label is a One-Hot CAPTCHALENGTH vector put together, where we want to use to calculate the cross-entropy, but the cross-entropy calculated time, label the last one-dimensional parameter vector sum of the individual elements must be 1, or compute the gradient when the problem occurs. For details, see TensorFlow official document:

https://www.tensorflow.org/apidocs/python/tf/nn/softmaxcrossentropywithlogits


But now the label argument is a One-Hot CAPTCHALENGTH vector put together, so here is the sum of the individual elements CAPTCHALENGTH, so we need to reshape it, to ensure that the various elements of the final one-dimensional and 1:


1


So that we can ensure that the last dimension is VOCAB_LENGTH length, but it is a One-Hot vectors, so each element must sum to one.

Loss Accuracy and then calculated like:

1

The next training to perform:

1

Here we first initialize traininitializer, bind iterator to Train Dataset, and then perform trainop, get loss, acc, gstep and other results and outputs.

training

Run training process, results similar to the following:

1

test

Training process we can also save every few Epoch about the model:

1

Of course, you can take on the validation set up accuracy of the model to be saved.

Reload verification we can re-look model, and then verify:

1



Guess you like

Origin blog.51cto.com/14192352/2400647
Recommended