TensorFlow study notes (seven) - MNIST - Advanced

Construction of a multi-layer network convolution

On MNIST only 91% correct rate, too bad. In this section, we use a slightly more complex model: convolution neural network to improve performance. This will probably reach 99.2% accuracy rate. Although not the highest, but it is quite satisfactory.

Weight initialization

To create this model, we need to create a lot of weights and bias term. This model focuses on Quanshi initialized should be added to a small amount of noise to break the symmetry and to avoid 0 gradient. Since we are using the ReLU neurons, and therefore a better approach is to use a small positive number to initialize the bias term, in order to avoid constant output neuron node to issue zero (dead neurons). In order to model the time is not repeatedly do the initial operation, we define two functions for initialization.

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

Convolution and pooling

TensorFlow a lot of flexibility in the convolution and pooled. How we deal with borders? How big should step? In this example, we will always use the vanilla version. We use convolution 1 step (stride size), 0 margins (padding size) template to ensure that the output and input are the same size. Our pool of doing max pooling with a simple traditional 2x2 size template. In order to code more concise, this part of our abstracted into a function.

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

The first convolutional layer

Now we can start implementing the first layer. It is then a max pooling is done by a convolution. 32 wherein the convolution is calculated in each patch 5x5. Weight weight tensor convolution shape [5, 5, 1, 32], the first two dimensions the size of the patch, then the number of input channels, and finally the number of output channels. And offset for each channel has a corresponding output.

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

In order to use this layer, we put xinto a vector 4d, its second and three-dimensional image corresponding to the width, height, color channel dimension representing the number of the last image (grayscale is so because the number of channels here is 1 if it is rgb color map, it is 3).

x_image = tf.reshape(x, [-1,28,28,1])

The then Convolve WE  x_image with Tensor at The weight, the Add at The BIAS, the Apply at The ReLU function, and a finally max the pool. We put x_imageand weight vector convolution, plus bias term, and then apply ReLU activation function, and finally max pooling.

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

The second layer convolution

In order to build a deeper network, we will several similar layer stack up. A second layer, each patch 64 will get a 5x5 feature.

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

Dense layer connection

Now, the image size is reduced to 7x7, we have added a 1,024 neuron fully connected layers for processing the entire image. We pooled layer output tensor reshape into a number of vector, multiplied by the weight matrix, plus the bias, then its use ReLU.

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

Dropout

In order to reduce over-fitting, before we join dropout output layer. We use a placeholderto represent a neuron output remains unchanged in the probability of dropout. So that we can enable dropout during training, dropout turned off during the test. TensorFlow of tf.nn.dropoutoperation in addition to the outer shield of the output neurons, processing will automatically scale the output of a neural element. So when you can use dropout regardless of scale.

keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

Output layer

Finally, we add a softmax layer, just like the previous single-softmax regression the same.

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

Training and evaluation model

How well does this model?

For training and evaluation, we use a simple monolayer before SoftMax a neural network model code is almost the same, but we will do the steepest descent gradient with more complex ADAM optimizer, the feed_dictaddition of an additional parameter keep_probto control the dropout proportion. Output once every 100 iterations then log.

cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print "step %d, training accuracy %g"%(i, train_accuracy)
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print "test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

The above code, the accuracy in the final test set is about 99.2%.

So far, we have learned to quickly and easily set up with TensorFlow, a complex depth learning model a little training and evaluation.

program 

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

import tensorflow as tf

sess = tf.InteractiveSession()

x = tf.placeholder("float", [None, 784])
y_ = tf.placeholder("float", shape=[None, 10])

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

sess.run(tf.initialize_all_variables())

y = tf.nn.softmax(tf.matmul(x, W) + b)

cross_entropy = -tf.reduce_sum(y_*tf.log(y))

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

for i in range(1000):
    batch = mnist.train.next_batch(50)
    train_step.run(feed_dict={x:batch[0], y_:batch[1]})

# 评估模型
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))

#这里返回一个布尔数组。为了计算我们分类的准确率,我们将布尔值转换为浮点数来代表对、错,然后取平均值。例如:[True, False, True, True]变为[1,0,1,1],计算出平均值为0.75。
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

# 最后,我们可以计算出在测试数据上的准确率,大概是91%。
print(accuracy.eval(feed_dict={x:mnist.test.images, y_:mnist.test.labels}))

"""
    权重初始化
"""
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

"""
    卷积和池化
"""
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# 第一层卷积
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

x_image = tf.reshape(x, [-1, 28, 28, 1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

# 第二层卷积
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

# 密集连接层
W_fc1 = weight_variable([7*7*64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# dropout
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# 输出层
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

"""
    训练和评估模型
"""
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())

for i in range(20000):
    batch = mnist.train.next_batch(50)
    if i%100 == 0:
        train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_:batch[1], keep_prob:1.0})
        print("step %d, training accuracy %g" % (i, train_accuracy))
    train_step.run(feed_dict={x:batch[0], y_:batch[1], keep_prob:0.5})

print("test accuracy %g" % accuracy.eval(feed_dict={x:mnist.test.images, y_:mnist.test.labels, keep_prob:1.0}))

Optional Reference Note 

Process explain

Process roughly divided into three steps: 
1 Construction CNN network structure; 
2, constructed loss function, an optimization configuration; 
3, training and testing.

The overall structure of the neural network overview:

Two layers are used in convolution tutorial + cell layer, and finally connected to two fully connected layers. 
The first layer using the convolution of a convolution kernel 3x3x1 32, step 1, treatment of a boundary "SAME" (input and output of the convolutional remains the same size), Relu excitation function, followed by a pool of 2x2 layer, is a way to maximize the pooled; 
a second layer using a convolution kernel convolution of 3x3x32 50, step 1, treatment of a boundary "SAME", Relu excitation function, followed by a 2x2 cell layer, to maximize the pooled manner; 
first layer fully connected layers: 1024 neurons, excitation function is still Relu. 
The second layer fully connected layers: with 10 neurons, SoftMax excitation function, for outputting the results.

Code

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
#读取数据
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
sess=tf.InteractiveSession()
#构建cnn网络结构
#自定义卷积函数(后面卷积时就不用写太多)
def conv2d(x,w):
    return tf.nn.conv2d(x,w,strides=[1,1,1,1],padding='SAME')
#自定义池化函数
def max_pool_2x2(x):
    return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')
#设置占位符,尺寸为样本输入和输出的尺寸
x=tf.placeholder(tf.float32,[None,784])
y_=tf.placeholder(tf.float32,[None,10])
x_img=tf.reshape(x,[-1,28,28,1])

#设置第一个卷积层和池化层
w_conv1=tf.Variable(tf.truncated_normal([3,3,1,32],stddev=0.1))
b_conv1=tf.Variable(tf.constant(0.1,shape=[32]))
h_conv1=tf.nn.relu(conv2d(x_img,w_conv1)+b_conv1)
h_pool1=max_pool_2x2(h_conv1)

#设置第二个卷积层和池化层
w_conv2=tf.Variable(tf.truncated_normal([3,3,32,50],stddev=0.1))
b_conv2=tf.Variable(tf.constant(0.1,shape=[50]))
h_conv2=tf.nn.relu(conv2d(h_pool1,w_conv2)+b_conv2)
h_pool2=max_pool_2x2(h_conv2)

#设置第一个全连接层
w_fc1=tf.Variable(tf.truncated_normal([7*7*50,1024],stddev=0.1))
b_fc1=tf.Variable(tf.constant(0.1,shape=[1024]))
h_pool2_flat=tf.reshape(h_pool2,[-1,7*7*50])
h_fc1=tf.nn.relu(tf.matmul(h_pool2_flat,w_fc1)+b_fc1)

#dropout(随机权重失活)
keep_prob=tf.placeholder(tf.float32)
h_fc1_drop=tf.nn.dropout(h_fc1,keep_prob)

#设置第二个全连接层
w_fc2=tf.Variable(tf.truncated_normal([1024,10],stddev=0.1))
b_fc2=tf.Variable(tf.constant(0.1,shape=[10]))
y_out=tf.nn.softmax(tf.matmul(h_fc1_drop,w_fc2)+b_fc2)

#建立loss function,为交叉熵
loss=tf.reduce_mean(-tf.reduce_sum(y_*tf.log(y_out),reduction_indices=[1]))
#配置Adam优化器,学习速率为1e-4
train_step=tf.train.AdamOptimizer(1e-4).minimize(loss)

#建立正确率计算表达式
correct_prediction=tf.equal(tf.argmax(y_out,1),tf.argmax(y_,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

# 定义saver
saver = tf.train.Saver()

#开始喂数据,训练
tf.global_variables_initializer().run()
for i in range(20000):
    batch=mnist.train.next_batch(50)
    if i%100==0:
        train_accuracy=accuracy.eval(feed_dict={x:batch[0],y_:batch[1],keep_prob:1})
        print("step %d,train_accuracy= %g"%(i,train_accuracy))
    train_step.run(feed_dict={x:batch[0],y_:batch[1],keep_prob:0.5})#这里才开始真正进行训练计算
# 模型储存位置
saver.save(sess, ".\\MNIST_data\\model.ckpt")

#训练之后,使用测试集进行测试,输出最终结果
print("test_accuracy= %g"% accuracy.eval(feed_dict={x:mnist.test.images,y_:mnist.test.labels,keep_prob:1}))

 

Code section by section analysis:

1.

#自定义卷积函数(后面就不用写太多)
def conv2d(x,w):
return tf.nn.conv2d(x,w,strides=[1,1,1,1],padding='SAME') 
#自定义池化函数 
def max_pool_2x2(x):
return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')

A step of convolution, such as to change the step size is two strides = [1,2,2,1], is effective only between the two (for two-dimensional map for), using the 'SAME' of padding the method (i.e., input and output remains the same size, at least one or two pixels at the boundary automatically fill); pooled layer is disposed similarly, the pool size of 2X2.

2.

#设置占位符,尺寸为样本输入和输出的尺寸
x=tf.placeholder(tf.float32,[None,784])
y_=tf.placeholder(tf.float32,[None,10])
x_img=tf.reshape(x,[-1,28,28,1])

Setting input and output placeholder placeholder is fed to a data entry session, since TensorFlow, the calculation by constructing FIG designed network, while the network is running computing session starts, the process Unable direct intervention required for data input to a session with placeholder. 
After placeholder set, x will be deformed into a 28x28 matrix (tf.reshape () function).

3.

#设置第一个卷积层和池化层
w_conv1=tf.Variable(tf.truncated_normal([3,3,1,32],stddev=0.1))
b_conv1=tf.Variable(tf.constant(0.1,shape=[32]))
h_conv1=tf.nn.relu(conv2d(x_img,w_conv1)+b_conv1)
h_pool1=max_pool_2x2(h_conv1)

#设置第二个卷积层和池化层
w_conv2=tf.Variable(tf.truncated_normal([3,3,32,50],stddev=0.1))
b_conv2=tf.Variable(tf.constant(0.1,shape=[50]))
h_conv2=tf.nn.relu(conv2d(h_pool1,w_conv2)+b_conv2)
h_pool2=max_pool_2x2(h_conv2)

The first layer using a convolution kernel convolution 3x3x1, a total of 32 convolution kernel weight cutoff variance of the normal distribution of 0.1 (not more than twice the maximum for the distribution of the variance) is initialized, the early offset value is set to a constant value of 0.1. 
A first layer and a second layer similar to convolution, convolution kernel of size 3x3x32 (32 is the number of channels, since the upper layer 32 using a convolution kernel, so that this layer becomes a channel number of 32), this a total of 50 layers using a convolution kernel, and the other is provided on the same layer. 
Each layer connected maximize cell operation after a 2x2 convolution finished.

4.

#设置第一个全连接层
w_fc1=tf.Variable(tf.truncated_normal([7*7*50,1024],stddev=0.1))
b_fc1=tf.Variable(tf.constant(0.1,shape=[1024]))
h_pool2_flat=tf.reshape(h_pool2,[-1,7*7*50])
h_fc1=tf.nn.relu(tf.matmul(h_pool2_flat,w_fc1)+b_fc1)

#dropout(随机权重失活)
keep_prob=tf.placeholder(tf.float32)
h_fc1_drop=tf.nn.dropout(h_fc1,keep_prob)

#设置第二个全连接层
w_fc2=tf.Variable(tf.truncated_normal([1024,10],stddev=0.1))
b_fc2=tf.Variable(tf.constant(0.1,shape=[10]))
y_out=tf.nn.softmax(tf.matmul(h_fc1_drop,w_fc2)+b_fc2)

After two convolutional layer is fully connected layers, a first layer 1024 fully connected neurons, 2x2 convolution output layer was first expanded into a strip using Relu activation function to obtain an output, the output is 1024 dimension. 
Dropout: (random inactivation weight) used in this layer Dropout, some synapses connections forced to zero, the neural network can be prevented trick overfitting. Here's dropout retention ratio is 0.5, which retained half of the random weights, delete the other half (do not feel pity for ensuring the effect on the test set, which is a must). Dropout ratio set by placeholder, because the training process needs dropout, but at the end of the testing process, we want to use all of the weight, so to be able to change the ratio of dropout, so here use placeholder. 
The second layer 10 fully connected neurons, corresponding to the 10 digits 0-9 (yes, this layer finally obtain a final recognition result), but with each layer is different from the previous, where activation function using Softmax, about softmax, my personal understanding is softmax is "exponentially as the kernel function of normalized operations" , softmax and general normalization operation is different is that the exponential function can zoom within a distribution of the individual differences in value, enabling the "gap" of each value becomes larger, "polarization" phenomenon will be more obvious (to a distribution with distributed generally normalized to get the distribution and softmax obtained distribution information obtained softmax to a greater entropy)

5.

#建立loss function,为交叉熵
loss=tf.reduce_mean(-tf.reduce_sum(y_*tf.log(y_out),reduction_indices=[1]))
#配置Adam优化器,学习速率为1e-4
train_step=tf.train.AdamOptimizer(1e-4).minimize(loss)

#建立正确率计算表达式
correct_prediction=tf.equal(tf.argmax(y_out,1),tf.argmax(y_,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

Establish loss function is a very important process, used here as a cross-entropy loss, cross entropy is a measure of how similar the two distributions, the closer the two distributions, the smaller the cross-entropy. 
Adam optimizer used to minimize the loss, learning rate configured to 1e-4. Then evaluate the expression to establish the correct rate (note, just build it, has not yet begun the real calculation), tf.argmax (y_, 1) , where the function that returns the largest value of the standard, tf.equal ( ) is used to calculate two values are equal. tf.cast () function is used to implement a data type conversion (here, converted to float32), tf.reduce_mean () for averaging (to give the correct rate).

6.

#开始喂数据,训练
tf.global_variables_initializer().run()
for i in range(20000):
    batch=mnist.train.next_batch(50)
    if i%100==0:
        train_accuracy=accuracy.eval(feed_dict={x:batch[0],y_:batch[1],keep_prob:1})
        print "step %d,train_accuracy= %g"%(i,train_accuracy)
    train_step.run(feed_dict={x:batch[0],y_:batch[1],keep_prob:0.5})#这里才开始真正进行训练计算

The next step is to feed data, and to train the network, the first use tf.global_variables_initializer (). Run () initializes all the data from a centralized training data mnist 50 samples taken as a group training, conducted a total of 20,000 sets of training, once every 100 outputs the set data on the correct rate. 
The training mode is calculated: train_step.run (feed_dict = {x: batch [0], y_: batch [1], keep_prob: 0.5}), to the session by the training data feed_dict (and other like in the calculation process of conveying in real-time adjustment of parameters, such as dropout ratio). 
This code can be seen, retain the training dropout ratio is 0.5, when the retention ratio was 1 test.

7.

#训练之后,使用测试集进行测试,输出最终结果
Print "test_accuracy= %g"%accuracy.eval(feed_dict={x:mnist.test.images,y_:mnist.test.labels,keep_prob:1})

Finally, input test data set to test verification, the code running up to see the results of it.

The next step is to feed data, and to train the network, the first use tf.global_variables_initializer (). Run () initializes all the data from a centralized training data mnist 50 samples taken as a group training, conducted a total of 20,000 sets of training, once every 100 outputs the set data on the correct rate. 
The training mode is calculated: train_step.run (feed_dict = {x: batch [0], y_: batch [1], keep_prob: 0.5}), to the session by the training data feed_dict (and other like in the calculation process of conveying in real-time adjustment of parameters, such as dropout ratio). 
This code can be seen, retain the training dropout ratio is 0.5, when the retention ratio was 1 test.

operation result

The final result of a long run this (accuracy rate of 99.19% on the test set, not bad):

step 18800,train_accuracy= 0.98
step 18900,train_accuracy= 1
step 19000,train_accuracy= 0.98
step 19100,train_accuracy= 1
step 19200,train_accuracy= 1
step 19300,train_accuracy= 1
step 19400,train_accuracy= 1
step 19500,train_accuracy= 1
step 19600,train_accuracy= 1
step 19700,train_accuracy= 1
step 19800,train_accuracy= 1
step 19900,train_accuracy= 1
2018-09-21 03:31:26.823342: W T:\src\github\tensorflow\tensorflow\core\framework\allocator.cc:108] Allocation of 1003520000 exceeds 10% of system memory.
2018-09-21 03:33:03.734611: W T:\src\github\tensorflow\tensorflow\core\framework\allocator.cc:108] Allocation of 250880000 exceeds 10% of system memory.
2018-09-21 03:33:05.869754: W T:\src\github\tensorflow\tensorflow\core\framework\allocator.cc:108] Allocation of 392000000 exceeds 10% of system memory.
2018-09-21 03:34:56.028297: W T:\src\github\tensorflow\tensorflow\core\framework\allocator.cc:108] Allocation of 98000000 exceeds 10% of system memory.
test_accuracy= 0.9919

Process finished with exit code 0

 

Published 47 original articles · won praise 121 · views 680 000 +

Guess you like

Origin blog.csdn.net/guoyunfei123/article/details/82855324