本文仅为cnn基于tensorflow的代码部分笔记,主要内容各层的搭建与参数的的设置,cnn介绍:点我链接

1.简介

本文主要实现lenet5的在手写数字识别数据集mnist上的训练和使用,代码分为cnntest.py和lenet5_app.py两份代码.cnntest.py用来训练模型和保存模型,lenet5_app.py用来读取和使用训练好的模型.

2.搭建和训练模型

2.1激活函数

激活函数对深度神经网络品质的好坏有非常重要的影响.激活函数就是把输入信号转换为相对应的输出信号的一个函数.在深度神经网络中因为使用反向传播算法(Backpropapagation,BP),反向传播算法(Backpropapagation,BP)又使用梯度下降法(gradient descent)进行权重参数的更新,而梯度下降法又要求函数可微,因此深度神经网络的激活函数常是非线性函数并且在深度神经网络中具有重要地位.
常见的激活函数有线性激活函数，阶跃函数又称二值函数,sigmoid函数，tanH函数,ReLu函数,Leak ReLu函数，不同的激活函数具有不同的优缺点．

线性激活函数：其中最简单的函数为线性激活函数，其特点是输出与输入成比例，但是线性函数自身存在的问题是其导数和梯度是一个常量，因此不能用于基于BP算法的深度神经网络．
阶跃函数：只能用于神经网络的早期类型－－单层感知机，并且需要输入数据线性可分．
sigmoid函数：sigmoid常用作二分类问题,其优点是一阶微分容易求得,但是其有梯度消失问题,网络越深信息丢失的越明显,即使网络输出层训练误差很大,但当其传到浅层网络时,误差对浅层的网络的影响也很弱弱,因此采用sigmoid作激活函数的深度神经网络难以训练.

$f (x) = 1 1 + e x$ $f(x)=\frac{1}{1+e^x}$
$f (x)' = f (x) (1 - f (x))$ $f(x)'=f(x)(1-f(x))$
双正切tanh函数:tanh函数的图像与sigmoid类似,但在0附近tanh函数的导数更大,所以在与采用sigmoid作为激活函数的深度神经网络相比,采用tanh作为激活函数的神经网络能够更快的收敛.

$t a n h : f (x) = 1 - e - 2 x 1 + e - 2 x$ $tanh:f(x)=\frac{1-e^{-2x}}{1+e^{-2x}}$
ReLU及其变种函数:ReLU函数的训练速度比tanh函数要快6倍.当输入为正时输出等于输入,当输入为非正时输出为0.(??bp算法为什么可以使用这个函数,不太懂)ReLu函数变种包括Leaky ReLus函数,PReLU函数.
softmax函数:softmax主要用于多分类任务.它把一组值转换为后验概率.softmax相当于一个类别概率分布,告诉待分类目标属于每个类别的概率.

$σ (z) = e z j \sum K k = 1 e z k f o r j = 1, . . . K$ $\sigma(z)= \frac{e^{z_j}}{\sum_{k=1}^{K}e^{zk}} for j=1,...K$
总的来说,在使用激活函数时,ReLu及其变种函数比sigmoid函数和tanh函数更加有效,因为ReLU函数训练的速度远远高于另外两个函数,并且没有sigmoid和tanh函数的梯度消失的问题.因此深度神经网络的隐藏层应避免使用sigmoid和tanh函数.

2.2数据集

作为机器学习界的”hello world”,MNIST数据集是一个手写数字识别数据集,我们可以从这个网站下载数据包:点击跳转.MNIST数据集分为四个压缩包,分别为训练图片,训练图片的标签,测试图片和测试图片的标签.训练集包含60000个已经标注好的样本,测试集为10000个标注好的样本.数据的读取我们可以使用Tensorflow的mnist模块进行解析.代码如下

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".",one_hot = True)

label=mnist.test.labels
print(label.shape)
print(label[0])

输出结果:
(10000, 10)
[ 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]

```
x_img=mnist.test.images
x_img.shape

数出结果:
(10000, 784)

x_img[0].shape

输出结果:
(784,)
由以上代码可以看出其读取的数据与网站上描述的数据相一致,这样我们就可以继续接下来的工作.

2.3模型构建

2.3.1卷基层与池化层

卷基层,池化层,和全连接层可以参照前面转载的一片文章:深入理解卷积神经网络,本文仅实现基于Tensorflow的网络搭建和用到函数的参数的讲解.首先预定义卷基层和池化层,其中conv2d的参数详解可以参照这篇博客:tf.nn.conv2d参数详解.池化层:池化操作 tensorflow tf.nn.max_pool

#pre-define
def conv2d(x,W):
    return tf.nn.conv2d(x,W,
                        strides=[1,1,1,1],
                        padding='SAME')
def max_pool_2x2(x):
    return tf.nn.max_pool(x,ksize=[1,2,2,1],
                          strides=[1,2,2,1],
                          padding='SAME')

2.3.2搭建整个模型

一个典型的 CNN 结构看起来是这样的：
输入→卷积→ReLU→卷积→ReLU→池化→ReLU→卷积→ReLU→池化→全连接,根据这个结构我们可以轻松的搭建出整个网络结构,主要麻烦的是参数个数的确定.

def multilayer_preceptron(x,weights,biases):
    #now,we want to change this to a CNN network
    #first,reshape the data to 4_D ,
    x_image=tf.reshape(x,[-1,28,28,1])
    #then apply cnn layers ,cnn layer and activation function --relu
    h_conv1=tf.nn.relu(conv2d(x_image,weights['conv1'])+biases['conv_b1'])
    #first pool layer
    h_pool1=max_pool_2x2(h_conv1)
    #second cnn layer
    h_conv2=tf.nn.relu(conv2d(h_pool1,weights['conv2'])+biases['conv_b2'])
    #second pool layer
    h_pool2=max_pool_2x2(h_conv2)

    h_pool2_flat=tf.reshape(h_pool2,[-1,7*7*64])
    h_fc1=tf.nn.relu(tf.matmul(h_pool2_flat,weights['fc1'])+biases['fc1_b'])
    out_layer=tf.matmul(h_fc1,weights['out'])+biases['out_b']
    return out_layer

2.3.3权重系数与偏置项

在2.3.2中可以看到有一个x_image=tf.reshape(x,[-1,28,28,1])的操作,这是因为tf.nn.conv2d()函数的input输入要求是一个四维张量,因为MNIST数据集读取之后是一个10000*784的二维张量,在reshape之后就变成10000*28*28*1的一个4维张量.
第一层卷积核参数conv1为[5,5,1,32],前两个数字为卷积核的大小,1为通道数,32为卷积过后输入特征图的数量.这个参数唯一可以确定的是第三个参数,这个参数值与输入图片的通道数参数一致.卷积核的大小,和输出特征图的个数则根据个人经验来设置了可以参考卷积神经网络的卷积核大小、卷积层数、每层map个数都是如何确定下来的呢？.第二层卷积核大小为[5,5,32,64],因为第二层的输入是上一层的数据,上一层共有32个特征图所以第三个参数为32,设定本层输出为64个特征图.上节代码中h_fc1=tf.nn.relu(tf.matmul(h_pool2_flat,weights['fc1'])+biases['fc1_b'])是全连接层,输入层到全连接层其中经过两次步长为2的池化操作,特征图大小由初始的28*28变为7*7,前一层输出的特征图个数为64,后一个参数为输出全连接层的输出,这里设为256也可以设为其他值一般设为2的n次方,所以fc1的shape为[7*7*64,256],根据前一层的输入,最后的out参数设定为[256,n_classes],这里的n_classes为最后分类类别的个数.

weights={
    'conv1':tf.Variable(tf.random_normal([5,5,1,32])),
    'conv2':tf.Variable(tf.random_normal([5,5,32,64])),
    'fc1':tf.Variable(tf.random_normal([7*7*64,256])),
    'out':tf.Variable(tf.random_normal([256,n_classes]))
}
biases={
    'conv_b1':tf.Variable(tf.random_normal([32])),
    'conv_b2':tf.Variable(tf.random_normal([64])),
    'fc1_b':tf.Variable(tf.random_normal([256])),
    'out_b':tf.Variable(tf.random_normal([n_classes]))

2.3.4模型的训练

完整的训练和保存模型代码如下


from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".",one_hot = True)

import tensorflow as tf
import os
#Parameters
learning_rate = 0.1
training_epochs = 5

batch_size = 100
display_step = 1
#Network Parameters
n_input = 784
n_classes = 10

#tf Graph input
x = tf.placeholder("float",[None,n_input])
y = tf.placeholder("float",[None,n_classes])

#pre-define
def conv2d(x,W):
    return tf.nn.conv2d(x,W,
                        strides=[1,1,1,1],
                        padding='SAME')
def max_pool_2x2(x):
    return tf.nn.max_pool(x,ksize=[1,2,2,1],
                          strides=[1,2,2,1],
                          padding='SAME')
#Create model
def multilayer_preceptron(x,weights,biases):
    #now,we want to change this to a CNN network
    #first,reshape the data to 4_D ,
    x_image=tf.reshape(x,[-1,28,28,1])
    #then apply cnn layers ,cnn layer and activation function --relu
    h_conv1=tf.nn.relu(conv2d(x_image,weights['conv1'])+biases['conv_b1'])
    #first pool layer
    h_pool1=max_pool_2x2(h_conv1)
    #second cnn layer
    h_conv2=tf.nn.relu(conv2d(h_pool1,weights['conv2'])+biases['conv_b2'])
    #second pool layer
    h_pool2=max_pool_2x2(h_conv2)

    h_pool2_flat=tf.reshape(h_pool2,[-1,7*7*64])
    h_fc1=tf.nn.relu(tf.matmul(h_pool2_flat,weights['fc1'])+biases['fc1_b'])
    out_layer=tf.matmul(h_fc1,weights['out'])+biases['out_b']
    return out_layer

weights={
    'conv1':tf.Variable(tf.random_normal([5,5,1,32])),
    'conv2':tf.Variable(tf.random_normal([5,5,32,64])),
    'fc1':tf.Variable(tf.random_normal([7*7*64,256])),
    'out':tf.Variable(tf.random_normal([256,n_classes]))
}
biases={
    'conv_b1':tf.Variable(tf.random_normal([32])),
    'conv_b2':tf.Variable(tf.random_normal([64])),
    'fc1_b':tf.Variable(tf.random_normal([256])),
    'out_b':tf.Variable(tf.random_normal([n_classes]))
}
#Construct model
pred = multilayer_preceptron(x,weights,biases)

#Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred,labels=y))
optimizer=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
#Initializing the variables
init = tf.global_variables_initializer()
#saver model
model_saver = tf.train.Saver()

#Launch the gtrph
with tf.Session() as sess:
    sess.run(init)
    #Training cycle
    for epoch in range(training_epochs):
        avg_cost=0.
        total_batch=int(mnist.train.num_examples/batch_size)
        #Loop over all batches
        for i in range(total_batch):
            batch_x,batch_y=mnist.train.next_batch(batch_size)
            #run optimization op (backprop)and cost op (to get loss value)
            _,c=sess.run([optimizer,cost],feed_dict={x:batch_x,y:batch_y})
            #Compute average loss
            avg_cost+=c/total_batch
            #Display logs per epoch step
        if epoch % display_step==0:
            print("Epoch:",'%04d' % (epoch+1),"cost=","{:.9f}".format(avg_cost))
    print("Optimization Finished!")
    correct_prediction=tf.equal(tf.argmax(pred,1),tf.argmax(y,1))
    #Calcuate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction,"float"))
    print("Accuracy:",accuracy.eval({x:mnist.test.images,y:mnist.test.labels}))

    #create dir for model saver
    model_dir = "mnist"
    model_name = "cpk"

    if not os.path.exists(model_dir):
        os.makedirs(model_dir)
    model_saver.save(sess,os.path.join(model_dir,model_name))
    print("model saved sucessfully")

3.模型恢复与使用

训练好的模型保存与恢复主要使用tf.nn.Saver()这个类,可以参考一下存储Tensorflow训练网络的参数,saver只是保存模型中的variables,在使用时要申明和原来训练时相同的variables.在应用模型时可以参考以下代码是在训练代码基础上删除了训练部分,只保留和网络结构有关的部分.

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".",one_hot = True)

import tensorflow as tf
import os

batch_size = 100
display_step = 1
#Network Parameters
n_input = 784
n_classes = 10

#tf Graph input
x = tf.placeholder("float",[None,n_input])
y = tf.placeholder("float",[None,n_classes])

#pre-define
def conv2d(x,W):
    return tf.nn.conv2d(x,W,
                        strides=[1,1,1,1],
                        padding='SAME')
def max_pool_2x2(x):
    return tf.nn.max_pool(x,ksize=[1,2,2,1],
                          strides=[1,2,2,1],
                          padding='SAME')
#Create model
def multilayer_preceptron(x,weights,biases):
    #now,we want to change this to a CNN network
    #first,reshape the data to 4_D
    x_image=tf.reshape(x,[-1,28,28,1])
    #then apply cnn layers
    h_conv1=tf.nn.relu(conv2d(x_image,weights['conv1'])+biases['conv_b1'])
    h_pool1=max_pool_2x2(h_conv1)

    h_conv2=tf.nn.relu(conv2d(h_pool1,weights['conv2'])+biases['conv_b2'])
    h_pool2=max_pool_2x2(h_conv2)

    h_pool2_flat=tf.reshape(h_pool2,[-1,7*7*64])
    h_fc1=tf.nn.relu(tf.matmul(h_pool2_flat,weights['fc1'])+biases['fc1_b'])
    out_layer=tf.matmul(h_fc1,weights['out'])+biases['out_b']
    return out_layer

weights={
    'conv1':tf.Variable(tf.random_normal([5,5,1,32])),
    'conv2':tf.Variable(tf.random_normal([5,5,32,64])),
    'fc1':tf.Variable(tf.random_normal([7*7*64,256])),
    'out':tf.Variable(tf.random_normal([256,n_classes]))
}
biases={
    'conv_b1':tf.Variable(tf.random_normal([32])),
    'conv_b2':tf.Variable(tf.random_normal([64])),
    'fc1_b':tf.Variable(tf.random_normal([256])),
    'out_b':tf.Variable(tf.random_normal([n_classes]))
}
#Construct model
pred = multilayer_preceptron(x,weights,biases)
#create class Saver
model_saver = tf.train.Saver()

#Launch the gtrph
with tf.Session() as sess:
    #create dir for model saver
    model_dir = "mnist"
    model_name = "cpk"
    model_path=os.path.join(model_dir,model_name)
    model_saver.restore(sess,model_path)

    img=mnist.test.images[100].reshape(-1,784)
    img_label=sess.run(tf.argmax(mnist.test.labels[100]))

    ret=sess.run(pred,feed_dict={x:img})
    num_pred=sess.run(tf.argmax(ret,1))

    print("预测值:%d\n" % num_pred)
    print("真实值:",img_label)
    print("模型恢复成功")

参考文献

[1] Secret Sauce behind the beauty of Deep Learning: Beginners guide to Activation Functions
[2] http://blog.csdn.net/cyh_24/article/details/50593400
[3] http://yann.lecun.com/exdb/mnist/

Tensorflow实现cnn模型的训练与使用