Tensorflow MNIST 手写体识别代码注释（2）

tf.random_normal
tf.zeros
神经网络模型定义
tf.nn.softmax
定义损失函数
tf.reduce_mean

上次仔细研究了网络的节点和连接的定义。节点数据类型是 tf.Tensor，用函数 tf.placeholder 生成；连接数据类型是 tf.Variable，直接利用构造函数生成。这里我有个疑问，为什么节点类型不用 Tensor 直接定义，而是用 placeholder 间接生成呢？这个问题做个标记，以后再考虑。

tf.random_normal

代码：

W = tf.Variable(tf.random_normal([784, 10]))

在 Python 命令行下做如下试验：

>>> p = tf.random_normal([784, 10])
>>> x = tf.placeholder(tf.float32, [None, 784])
>>> p
<tf.Tensor 'random_normal_2:0' shape=(784, 10) dtype=float32>
>>> x
<tf.Tensor 'Placeholder_1:0' shape=(?, 784) dtype=float32>

函数 tf.random_normal([784, 10]) 返回 tf.Tensor 类型，这个和网络节点一样，都是张量数据类型。tf.Variable 利用一个张量作为初始化参数。我们看一下 p = tf.random_normal([784, 10]) 的内容：

>>> p
<tf.Tensor 'random_normal_2:0' shape=(784, 10) dtype=float32>

很遗憾， tf.random_normal 只是定义了这个产生随机数的函数，在这里并不会实际计算。为什么这样做呢？我的理解，Tensorflow 计算的定义和执行是分开的，只能借助 Session 执行实际的计算：

>>> sess = tf.Session()
>>> sess.run(p)
array([[-0.31686196,  1.4718755 ,  0.39538386, ..., -0.35172328,
        -1.1372167 ,  1.4963826 ],
       [ 1.8683176 , -0.50464696,  0.5987486 , ...,  0.06921218,
         1.4666113 , -0.12063375],
       [-0.5960838 ,  0.67847234, -1.2668965 , ..., -0.5763056 ,
         0.083912  ,  0.27317154],
       ...,
       [ 0.2990143 ,  0.87359935,  0.09930705, ..., -0.6843181 ,
         0.6503788 ,  0.03628052],
       [-1.2101758 ,  0.6168854 , -1.0640091 , ..., -0.82683617,
         0.37335992, -1.0628037 ],
       [ 0.7527495 , -1.7079144 , -0.50450134, ...,  0.43036935,
        -0.23441634,  1.2643566 ]], dtype=float32)

tf.random_normal([784, 10]) 返回 784x10 个随机数，其定义如下：

tf.random_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None)

shape: 输出张量的形状，必选
mean: 正态分布的均值，默认为0
stddev: 正态分布的标准差，默认为1.0
dtype: 输出的类型，默认为tf.float32
seed: 随机数种子，是一个整数，当设置之后，每次生成的随机数都一样
name: 操作的名称

现在弄明白了，W = tf.Variable(tf.random_normal([784, 10])) 的作用就是用随机数初始化网络连接权重。如果没猜错的话，这个实验中用到的模型，应该是单层网络，784个输入节点，10个中间节点（10个输出节点）。

tf.zeros

代码：：

b = tf.Variable(tf.zeros([10]))

不废话了，直接试验：

>>> q = tf.zeros([10])
>>> sess.run(q)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)

神经网络模型定义

代码：

pred = tf.nn.softmax(tf.matmul(x, W) + b)

我觉得已经看明白 Tensorflow 的模型定义了，上面给出了模型对应的代数表达式。matmul 定义矩阵乘法，接下来搞明白 softmax 就行了。

看到这里，我十分震惊。这个模型也太简单了，一个单层网络模型，加上 softmax ，测试结果手写体识别率竟然能做到 92% 以上。

tf.nn.softmax

根据前面的定义，tf.matmul(x, W) + b 输出 10 维向量， softmax 的作用是归一化，使得 10 维向量的总和为 1，计算公式如下：
$y_i = \frac{e^{x_i}}{e^{x_1}+e^{x_2}+...+e^{x_{10}}},(i = 1,2,...,10)$

定义损失函数

代码：

cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))

分析到这里，我基本明白 Tensorflow 的套路了。利用 tf 里面的各种函数定义模型的计算公式和损失函数的计算公式。接下来我们只需要分析 tf 到底能提供那些函数就行了。

按照我的理解，损失函数定义成下面这个样子就行了：

cost = tf.sum(tf.abs(pred - y))

这个例子中的损失函数和我想象的差距很大，复杂了很多，接下来分析一下原因。

首先用原版的 cost 迭代 50 次，识别准确率为 0.86。然后改成我写的简化版的 cost，识别准确率只有 0.2。这个结果说明我写的这个 cost 收敛的速度太慢了。

我尝试把学习率提升到 20，我的 cost 准确率提升到 0.92，看样子还不错。换成原版的 cost，把学习率设为 0.5，这时候的准确率提升为 0.92。由此可见，我的这个版本也还不错。

但是，我还是不理解原版的这个cost，做个记号，专门研究一下。

tf.reduce_mean

tf.reduce_mean()函数用于计算张量tensor沿着指定的数轴（tensor的某一维度）上的平均值，主要用作降维或者计算tensor（图像）的平均值。

tf.reduce_mean(
	input_tensor, 
	axis=None, 
	keep_dims=False, 
	name=None, 
	reduction_indices=None
)

参数：

input_tensor：输入的待降维的tensor
axis：指定的轴，如果不指定，则计算所有元素的均值
keep_dims：是否降维度，默认False。设置为True，输出的结果保持输入tensor的形状，设置为False，输出结果会降低维度
name：操作的名称
reduction_indices：在以前版本中用来指定轴，已弃用

在这里插入图片描述

例子1：

import tensorflow as tf
x = [[1,2,3], [4,5,6]]
y = tf.cast(x, tf.float32)

mean_all = tf.reduce_mean(y)
mean_0 = tf.reduce_mean(y, axis=0)
mean_1 = tf.reduce_mean(y, axis=1)

with tf.Session() as sess:
    m_a,m_0,m_1 = sess.run([mean_all, mean_0, mean_1])
 
print(m_a)
print(m_0)
print(m_1)

> 3.5
> [2.5 3.5 4.5]
> [2. 5.]

例子2：

import tensorflow as tf
x = [[1,2,3],
     [4,5,6]]
y = tf.cast(x, tf.float32)

mean_all = tf.reduce_mean(y, keep_dims=True)
mean_0 = tf.reduce_mean(y, axis=0, keep_dims=True)
mean_1 = tf.reduce_mean(y, axis=1, keep_dims=True)

with tf.Session() as sess:
    m_a,m_0,m_1 = sess.run([mean_all, mean_0, mean_1])
 
print(m_a)
print(m_0)
print(m_1)

> [[3.5]]
> [[2.5 3.5 4.5]]
> [[2.]   [5.]]

如果要设置保持原来的张量的维度，那么keep_dims=True。

类似函数还有:

tf.reduce_sum ：计算tensor指定轴方向上的所有元素的累加和;
tf.reduce_max ：计算tensor指定轴方向上的各个元素的最大值;
tf.reduce_all ：计算tensor指定轴方向上的各个元素的逻辑和（and运算）;
tf.reduce_any:：计算tensor指定轴方向上的各个元素的逻辑或（or运算）;

quicmous

发布了174 篇原创文章 · 获赞 80 · 访问量 35万+

私信关注

Tensorflow MNIST 手写体识别代码注释（2）

Tensorflow MNIST 手写体识别代码注释（2）

tf.random_normal

tf.zeros

神经网络模型定义

tf.nn.softmax

定义损失函数

tf.reduce_mean

猜你喜欢