Depth study into the pit of the two --- handwritten notes image recognition problem

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/LEEANG121/article/details/101841239

table of Contents

Foreword

EDITORIAL words:

1 This article is based on content tensorflow Chinese communities organize and re-sort the appropriate interpretation is made more friendly for beginners, this article focus only way to achieve the goal, the specific reasons for the implementation of the article does not get to the bottom, it will be in the other blog which further explanation.

2 herein is the data source http://yann.lecun.com/exdb/mnist/ datasets

3 write this blog post is dated October 1, 2019, and wish the motherland prosperity.

MNIST image handwriting recognition is the most basic and most classic case of machine learning, language learning is equivalent to the 'HELLO WORLD'. He contains a variety of handwritten digits picture:
Here Insert Picture Description
This data set contains a total of 10 categories of 0-9 digital pictures (tag), and each picture corresponding label (let the computer know is class 3 is 3,5 of 5) paper the upper part introduces the predicted image by the Softmax Regression simple mathematical model; half after the convolution neural network described herein to predict the same set of data, to compare two different ideas prediction model results influences.

Handwritten be identified by image-based modeling and softmax

Done by mathematical modeling and identification image softmax regresion general process is: 1, the input data; 2, model; 3, training model; 4, evaluation model.

data import

This data is the data given in the official website MNIST before import data, we first introduced tensorflow module, specific code as follows:

import warnings
warnings.filterwarnings('ignore') #忽略掉运行过程中出现的警告提示
#导入相关模块
import tensorflow as tf 
from tensorflow.examples.tutorials.mnist import input_data

#下载数据集
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
print(tf.__version__)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
1.14.0         

Next, check the downloaded data sets, specific code as follows:

#检查数据集
print(mnist.train.images.shape, mnist.train.labels.shape)#打印训练数据集
print("------------------------------------------------------------------------------------------")
print(mnist.test.images.shape, mnist.test.labels.shape)#打印测试数据集
print("------------------------------------------------------------------------------------------")
print(mnist.validation.images.shape, mnist.validation.labels.shape)#打印验证数据集
(55000, 784) (55000, 10)
----------------------------------------------------------------
(10000, 784) (10000, 10)
 ----------------------------------------------------------------
(5000, 784) (5000, 10)

From that result, the downloaded data is divided into two sets: the training data set of 60,000 rows (mnist.train) and 000 rows of test data sets (mnist.test). Wherein the training data is divided into 55 000 5000 training data and verification data.
Each image pixel contains a 28 x28 pixels, an image represented by an array of length 784 tensor, as shown below:
Here Insert Picture Description

Modeling softmax

After importing data, mathematical modeling began. The first time we use softmax return to build a simple model specific code is as follows:

#该数学模型的数学结构为y=Wx+b
x = tf.placeholder('float', [None, 784])#x不是一个特定的值,而是一个占位符placeholder,关于占位符,我们会在另外的文章中详谈,同时网上也有很多详细介绍
#上面的None表示此张量的第一个维度可以是任意长度
W = tf.Variable(tf.zeros([784, 10]))#W代表权重
b = tf.Variable(tf.zeros([10]))#b代表偏置项
y = tf.nn.softmax(tf.matmul(x,W) + b)

Note, W dimension is [784,10], because we want to multiply it with a 784-dimensional vector to obtain a picture of 10-dimensional vector value of evidence, every class corresponding to different numbers. b is the shape of [10], so we can directly output it to the top.

Trainer

Before training model, we must first define an index, use this indicator to determine the final output of the model is good or bad. In machine learning, the more common practice is that we define a loss function (loss function) / cost function (cost function), when the result of this function is smaller, we believe that the better the simulation of the model.
In this example, we use a cost function 'cross entropy' (cross-entropy), on cross entropy specific meaning, is not repeated herein. Specific code as follows:

#定义损失函数,判断模型好坏
y_ = tf.placeholder('float', [None, 10]) #定义一个新的占位符用于输入正确的值(即标签)
cross_entropy = -tf.reduce_sum(y_*tf.log(y))#定义交叉熵,关于交叉熵的具体含义及用法,我会在另外的文章中详细介绍
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy) #选择梯度下降优化器,并将学习率设为0.01
init = tf.initialize_all_variables() #初始化变量,这句话也可以写为tf.global_variables_initializer替代
sess = tf.Session() #运行对话,开启模型
sess.run(init)
#开始训练模型,循环设为1000次
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100) #以100作为一个训练批次进行训练
sess.run(train_step, feed_dict = {x: batch_xs, y_:batch_ys}) #这里指将训练数据放进x的占位符,将标签放进y_的占位符, y是预测值,靠计算得出

We train the model, through the bad to 1000, divided into 10 times to complete.

Model Assessment

When we finished the training data, we need to test, we trained the model is accurate, this time you need to conduct an evaluation model. Here we evaluate the test data. Specific code as follows:

#评估模型
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))#这行代码的目的是对比预测值y与标签y_是否匹配
accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float')) #这行代码会给我们一组布尔值。为了确定正确预测项的比例,我们可以把布尔值转换成浮点数,然后取平均值。例如,[True, False, True, True] 会变成 [1,0,1,1] ,取平均值后得到 0.75.
print (sess.run(accuracy, feed_dict = {x: mnist.test.images, y_: mnist.test.labels}))#评估模型准确率
0.9147

Model accuracy of the final result is about 91%. Next we will model a simple optimization, the introduction of convolution neural network, then compare the model accuracy convolution neural networks get is how much.

Handwritten performed by convolving the image network modeling and identification

Right convolutional neural network model requires heavy and offset term is much larger than the number softmax model number, and to avoid occurrence 0 weight gradients, we should add the right amount of noise, the symmetric case to prevent the occurrence weight. We are using ReLU neurons, and therefore a better approach is to use a small positive number to initialize the bias term, in order to avoid constant output neuron node to issue zero (dead neurons). Code specific implementation process is as follows:

Initialization weight

#重新构建一个卷积神经网络,预测同样的数据集并进行比较
# 定义权重和偏置项,该做法的具体意义我们以后另讲,这里不再赘述
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev = 0.1)
    return tf.Variable(initial)
def bias_variable(shape):
    initial = tf.constant(0.1, shape = shape)
    return tf.Variable(initial)

Layer and a convolution defined cell layer

In this example, we set a convolution step size, margins filled with zeros, the maximum cell layer using pooled a size of 2x2, the specific code is implemented as follows:

#卷积和池化处理
def conv2d(x, W): #定义卷积层
    return tf.nn.conv2d(x, W, strides = [1, 1, 1, 1], padding = 'SAME') #步长设为1,边距填充为0

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = 'SAME')

Add layer

After the cell layer and the convolution been defined layers, the next step is the convolution cell layer and into the layer-layer model is added in, and the specific code is explained as follows:

#第一层卷积
#第一层的结构包括一个卷积层加一个最大池化层。
W_conv1 = weight_variable([5, 5, 1, 32])# 前两个维度代表patch大小,1代表通道数目,32是输出的通道数目
b_conv1 = bias_variable([32])#对应上面每一个输出的通道
x_image = tf.reshape(x, [-1,28,28,1])#x的维度应该和W对应,其中第2、3维对应图片的宽和高,最后嗲表颜色通道数
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)#把x和全职进行卷积,再加上偏置项,应用RELU激活函数防止线性化
h_pool1 = max_pool_2x2(h_conv1)#添加池化层

#第二层卷积层
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

#密集连接层
W_fc1 = weight_variable([7 * 7 * 64, 1024])#图片尺寸由28减少到了7,原因是经历了两次2x2的最大池化
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

#dropout
#这一层的目的是防止模型过拟合,过拟合的模型会影响泛化能力
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

#输出层
#卷积神经网络的最后输出层依然采取全连接的形式
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

Training and Evaluation Model

Since the model to teach complex before the large amount of data, we have taken the ADAM optimizer performing a gradient descent.

#训练和评估模型
sess = tf.InteractiveSession()#这个一定要添加,否则会话无法计算
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
for i in range(20000):
  batch = mnist.train.next_batch(50)
    if i%100 == 0:
     train_accuracy = accuracy.eval(feed_dict={
     x:batch[0], y_: batch[1], keep_prob: 1.0})
     print('step %d, training accuracy %g'%(i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print("test accuracy %g"%accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
step 0, training accuracy 0.1
step 100, training accuracy 0.76
step 200, training accuracy 0.92
step 300, training accuracy 0.92
step 400, training accuracy 0.9
......
step 19400, training accuracy 1
step 19500, training accuracy 1
step 19600, training accuracy 1
step 19700, training accuracy 1
step 19800, training accuracy 1
step 19900, training accuracy 1

test accuracy 0.9916

From the final test results obtained, using a convolutional neural network model of the recognition rate of the image on the handwriting recognition problems can be increased to 99.2%

Guess you like

Origin blog.csdn.net/LEEANG121/article/details/101841239