本人学习TensorFlow中的一些学习笔记和感悟，仅供学习参考，有疑问的地方可以一起交流讨论，持续更新中。

本文学习地址为：TensorFlow官方文档，在此基础上加入了自己的学习笔记和理解。

文章是建立在有一定的深度学习基础之上的，建议有一定理论基础之后再同步学习。

这次是利用TensorFlow搭建两层的cnn的一个实验，具体流程如下。

1.基本操作

import tensorflow as tf
import tensorflow.examples.tutorials.mnist.input_data as input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x = tf.placeholder('float', shape=[None, 784])
y_hat = tf.placeholder('float', shape=[None, 10])

x_image = tf.reshape(x, [-1, 28, 28, 1])

sess = tf.InteractiveSession()

mnist，x和y_hat这三个变量和上一个博客里的三个变量是完全相同的，就不再说明了。

x_image 是把x变成一个4d向量，其第1维的-1是缺省值，第2、第3维对应图片的宽、高，最后一维代表图片的颜色通道数(因为是灰度图所以这里的通道数为1，如果是rgb彩色图，则为3)。

InteractiveSession()和Session()的区别是：InteractiveSession()是一种交替式的会话方式，它让自己成为了默认的会话，也就是说用户在单一会话的情境下，不需要指明用哪个会话也不需要更改会话运行的情况下，就可以运行。详细的解释可以看这个博客，对方写的很详细。

2.定义部分函数

2.1权重初始化

为了创建这个模型，我们需要创建大量的权重和偏置项。这个模型中的权重在初始化时应该加入少量的噪声来打破对称性以及避免0梯度。由于我们使用的是ReLU神经元，因此比较好的做法是用一个较小的正数来初始化偏置项，以避免神经元节点输出恒为0的问题（dead neurons）。为了不在建立模型的时候反复做初始化操作，我们定义两个函数用于初始化。

def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)


def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

tf.truncated_normal(shape, mean, stddev):shape表示生成张量的维度，mean是均值，stddev是标准差。这个函数产生正太分布，均值和标准差自己设定。

tf.constant(0.1，shape)用来产生大小为shape，值为0.1的张量

这两个函数主要是用来产生每一层的初始变量W和b的。

2.2 卷积和池化

定义两个函数conv2d和max_pool_2x2

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')


def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)卷积函数，出去name外其他5个参数分别为：

input: 输入图像，要求是Tensor，具有[batch, height, width, in_channels]这样的shape，要求类型为float32和float64。
filter: 卷积核，要求是Tensor，具有[height, width, in_channels, out_channels]这样的shape，要求类型为float32和float64。注意这里的in_channels就是input里的in_channels。
strides: 卷积时在图像每一维上的步长。一维向量，长度为4。一般情况下是这种格式[1, stride, stride, 1]，因为图片的高宽对应input里的2，3维度。
padding: string类型的量，只有‘SAME','VALID'两种可选。SAME会对图像作补0操作，在卷积后图像大小不会改变。VALID在卷积后图像大小会变小。
use_cudnn_on_gpu: bool类型，是否使用cudnn加速，默认true。

返回结果仍是Tensor，shape为[batch, height, width, in_channels]，就是我们说的feature map。

tf.nn.max_pool(value, ksize, strides, padding, name=None)。主要参数是四个，和卷积很类似：

value：需要池化的输入，一般池化层接在卷积层后面，所以输入通常是feature map，依然是[batch, height, width, channels]这样的shape
ksize：池化窗口的大小，取一个四维向量，一般是[1, height, width, 1]，因为我们不想在batch和channels上做池化，所以这两个维度设为了1

strides：和卷积类似，窗口在每一个维度上滑动的步长，一般也是[1, stride,stride, 1]

padding：和卷积类似，可以取'VALID' 或者'SAME'

定义这两个函数纯粹就是为了让代码显得更简洁，因为每层都会调用卷积和池化。

3.搭建cnn网络

# 第一层卷积
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
# 第二层卷积
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
# 全连接层
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# Dropout
keep_prob = tf.placeholder('float')
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# 输出层
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
# softmax
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

整个cnn的结构其实就很好理解了，结合上面的代码和cnn基础知识应该不难看懂。唯一要强调的就是dropout一般用在全连接部分，卷积和输出不会使用dropout，代码里的keep_prob是要保留的节点比例，一般训练的时候设为0.5，测试的时候设为1.0。y_conv是整个cnn的输出。

4.训练和评估模型

# 训练评估模型
cross_entropy = -tf.reduce_sum(y_hat*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_hat, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'))
sess.run(tf.initialize_all_variables())
for i in range(20000):
    batch = mnist.train.next_batch(50)
    if i % 100 == 0:
        train_accuracy = accuracy.eval(feed_dict={x: batch[0], y_hat: batch[1], keep_prob: 1.0})
        print("step %d, training accuracy %g" % (i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_hat: batch[1], keep_prob: 0.5})

print("test accuracy %g" % accuracy.eval(feed_dict={
    x: mnist.test.images, y_hat: mnist.test.labels, keep_prob: 1.0
}))

和上一篇博客里对回归模型的训练和评估基本上是一样的，没有特别的东西要讲的，就是说一下如果没有GPU或者只是想试一下整体效果的，不建议将循环次数设那么大。最后模型的准确率能达到99.2%。

5.完整代码

完整代码如下，欢迎交流讨论

import tensorflow as tf
import tensorflow.examples.tutorials.mnist.input_data as input_data


def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)


def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)


def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')


def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')


mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
sess = tf.InteractiveSession()

x = tf.placeholder('float', shape=[None, 784])
y_hat = tf.placeholder('float', shape=[None, 10])
x_image = tf.reshape(x, [-1, 28, 28, 1])
# 第一层卷积
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
# 第二层卷积
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
# 全连接层
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# Dropout
keep_prob = tf.placeholder('float')
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# 输出层
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
# softmax
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
# 训练评估模型
cross_entropy = -tf.reduce_sum(y_hat*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_hat, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'))
sess.run(tf.initialize_all_variables())
for i in range(200):
    batch = mnist.train.next_batch(50)
    if i % 100 == 0:
        train_accuracy = accuracy.eval(feed_dict={x: batch[0], y_hat: batch[1], keep_prob: 1.0})
        print("step %d, training accuracy %g" % (i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_hat: batch[1], keep_prob: 0.5})

print("test accuracy %g" % accuracy.eval(feed_dict={
    x: mnist.test.images, y_hat: mnist.test.labels, keep_prob: 1.0
}))

TensorFlow初学者入门（三）——MNIST进阶