Deep Learning Notes (2) Training a Multilayer Convolutional Neural Network on TensorFlow

Deep Learning Notes (2) Training a Multilayer Convolutional Neural Network on TensorFlow

The previous note mainly introduced the basic knowledge related to convolutional neural networks. In this note, I will use the mnist dataset with reference to the official TensorFlow documentation to train a multi-layer convolutional neural network on TensorFlow.

Download and import the mnist dataset

First, use input_data.py to download and import the mnist dataset. During this process, the dataset is downloaded and stored in a directory named "MNIST_data".

import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

mnist is a lightweight class that stores training sets, validation sets, and test sets in the form of Numpy arrays.

Enter an interactive TensorFlow session

TensorFlow actually corresponds to a C++ backend, and TensorFlow uses a session (Session) to connect with the backend. Usually, we will first create a graph, and then start it in the session (Session). The InteractiveSession gives us an opportunity for an interactive session, so that we can insert the computational graph when running the graph (Graph), otherwise we have to build the entire computational graph before starting the session. Using InteractiveSession will make our work more convenient, so in most cases, especially in an interactive environment, we will choose InteractiveSession.

import tensorflow as tf
sess = tf.InteractiveSession()

Handling input data with placeholders

Regarding the concept of placeholders, the official explanation is that "it is not a specific value, but a specific value can be entered according to the placeholder when TensorFlow runs a certain calculation". It is also easier to understand here.

x = tf.placeholder("float", shape=[None,784])

x represents the floating point tensor of the input image, so define the dtype as "float". Among them, the shape of None represents the shape of no specified tensor, which can feed tensors of any shape, which refers to the undetermined batch size here. The size of a mnist image is 28 28, and 784 is the dimension of a flattened mnist image, that is, 28 28=784.

y_ = tf.placeholder("float", shape=[None,10])

Since the mnist dataset is a dataset of handwritten numbers, there are only 10 categories, representing ten numbers from 0 to 9.

Weight and bias term initialization

In the process of initializing the weights, we add a small amount of noise to break the symmetry and prevent the gradient from disappearing. Here we set the standard deviation of the weights to 0.1. Since the activation function we use is ReLU, and the definition of ReLU is

and={0x(x0)(x<0)y={0(x≥0)x(x<0)

The image corresponding to ReLU is the function image on the left of the figure below.

But we can notice that ReLU is x<0The part of x<0 is hard-saturated, so as the training progresses, some of the inputs may fall into the hard-saturated region, causing the weights to fail to update, resulting in "neuron death". Although some people proposed new activation functions such as PReLU and ELU to improve in subsequent research, we should use a small positive number to initialize the bias term in our training here to avoid the constant output of neuron nodes. 0 problem.

#初始化权重
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev = 0.1)
    return tf.Variable(initial)
#初始化偏置项
def bias_variable(shape):
    initial = tf.constant(0.1, shape = shape )
    return tf.Variable(initial)

Convolution and Pooling

Here, we use the method of stride 1 and the same padding (padding='SAME') for convolution. The difference between the same padding and effective padding was explained clearly in the previous note , so I won't go into details here. . At the same time, use a 2x2 grid to pool with max pooling.

#卷积过程
def conv2d(x,w):
    return tf.nn.conv2d(x,w,
                        strides=[1,1,1,1],padding='SAME')
#池化过程
def max_pool_2x2(x):
    return tf.nn.max_pool(x,ksize=[1,2,2,1],
                        strides=[1,2,2,1],padding='SAME')

The first layer of convolution

(1 #28x28->32 #28x28)
First, in each 5x5 grid, 32 feature maps are extracted. The first two dimensions in weight_variable refer to the size of the grid, 1 in the third dimension refers to the number of input channels, and 32 in the fourth dimension refers to the number of output channels (it can also be understood as the number of convolution kernels used, the resulting feature map number of sheets). Each output channel has a bias term, so the number of bias terms is 32.

w_conv1 = weight_variable([5,5,1,32])
b_conv1=bias_variable([32])

In order to make it available for calculation, we use reshape to convert it into a four-dimensional tensor, where -1 in the first dimension means that we can not specify it first, the second three-dimensional refers to the size of the image, and the fourth dimension corresponds to the number of color channels , the grayscale image corresponds to 1, and the rgb image corresponds to 3.

x_image=tf.reshape(x,[-1,28,28,1])

Then, we use the ReLU activation function to perform the first convolution on it.

h_conv1=tf.nn.relu(conv2d(x_image,w_conv1)+b_conv1)

first pooling

(32 #28x28->32 #14x14)
is easier to understand, using a 2x2 grid to pool by max pooling.

h_pool1=max_pool_2x2(h_conv1)

Second layer convolution and second pooling

(32 #14x14->64 #14x14->64 #7x7)
Similar process to the first layer convolution and first pooling.

w_conv2=weight_variable([5,5,32,64])
b_conv2=bias_variable([64])

h_conv2=tf.nn.relu(conv2d(h_pool1,w_conv2)+b_conv2)
h_pool2=max_pool_2x2(h_conv2)

densely connected layer

At this point, the image is 7x7 in size. We add here a fully connected layer with 1024 neurons. Then, reshape the output tensor after pooling into a one-dimensional vector, multiply it by the weight, add the bias term, and then pass a ReLU activation function.

w_fc1=weight_variable([7*7*64,1024])
b_fc1=bias_variable([1024])

h_pool2_flat=tf.reshape(h_pool2,[-1,7*7*64])
h_fc1=tf.nn.relu(tf.matmul(h_pool2_flat,w_fc1)+b_fc1)

Dropout

keep_prob=tf.placeholder("float")
h_fc1_drop=tf.nn.dropout(h_fc1,keep_prob)

这是一个比较新的也非常好用的防止过拟合的方法,想出这个方法的人基本属于非常crazy的存在。在Udacity-Deep Learning的课程中有提到这个方法——完全随机选取经过神经网络流一半的数据来训练,在每次训练过程中用0来替代被丢掉的激活值,其它激活值合理缩放。
QQ20161110-1.png-164.8kB
QQ20161110-2.png-137.4kB

类别预测与输出

应用了简单的softmax,输出。

w_fc2=weight_variable([1024,10])
b_fc2=bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop,w_fc2)+b_fc2)

模型的评价

#计算交叉熵的代价函数
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
#使用优化算法使得代价函数最小化
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
#找出预测正确的标签
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
#得出通过正确个数除以总数得出准确率
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
#每100次迭代输出一次日志,共迭代20000次
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print "step %d, training accuracy %g"%(i, train_accuracy)
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print "test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

本文代码与部分内容来源或参考自:
TensorFlow官方文档
深度学习中的激活函数导引
Udacity-Deep Learning

0
0
« 上一篇: LeetCode - 231. Power of Two - 判断一个数是否2的n次幂 - 位运算应用实例 - ( C++ )
» 下一篇: 浅谈深度学习中的激活函数 - The Activation Function in Deep Learning
posted @  2016-11-10 23:01  ylzh 阅读( 6286) 评论( 0编辑  收藏

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326007933&siteId=291194637