TensorFlow学习二：SOFTMAX回归

下载MNIST数据: 参考资料https://blog.csdn.net/i8088/article/details/79126150，把四个文件下载之后，在运行的python同目录里面新建文件夹MNIST_data，然后把4个文件移动到里面。like this.

正式写代码：

PART 1 导入MNIST数据集

# encoding=utf-8
import tensorflow.examples.tutorials.mnist.input_data as input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

第一行

在python2.x里面代码要写中文必须将编码声明为utf-8

第二行

Module tensorflow.examples.tutorials.mnist.input_data
Functions for downloading and reading MNIST data. 用来下载和读取MNIST数据集的一些函数。

第三行

使用read_data_sets方法读取MNIST_data文件夹下的数据集。one_hot单点数据只有一项为真，其余为假。返回DataSets类型的对象。

PART 2

import tensorflow as tf
# placeholder必须用feed_dict进行赋值 否则会报错。placeholder返回一个Tensor对象
x = tf.placeholder(tf.float32, [None, 784])  # x是一个占位符， 再运行计算时输入这个值
W = tf.Variable(tf.zeros([784, 10]))  # W是变量（待定参数）
# zeros 输入数列，输出Tensor， [class]构造Variable构造函数参数Tensor
b = tf.Variable(tf.zeros([10]))
# softmax: 输入Logic， 输出Tensor
y = tf.nn.softmax(tf.matmul(x, W) + b)  # y = x * W + b

第一行

引入tensorflow包并起了一个别名

第三行

建立一个Float32类型、高度未知、宽度为784的占位符x。占位符在运行时必须用feed_dict方法进行填充，将来会被填充成样本数据。高度是样本数量，所以是未知的；宽度是每个样本的像素数量，是784个。在语法层面上，placeholder返回一个Tensor对象，感觉Tensor对象就是一个矩阵。深入理解Tensor对象>>

第四行

tf.zeros是返回一个全0的Tensor对象，使用这个对象构造一个Variable[变量]类型的对象W。变量类型可以理解为待求参数。

第六行

同上。构造一个变量b。

第八行

softmax文档：

Computes softmax activations. 计算softmax激活函数。
For each batch i and class j we have 对于每一批i和类型j有
softmax[i, j] = exp(logits[i, j]) / sum(exp(logits[i]))

$Softmax_{i,j}=\frac{e^{logits_{i,j}}}{\sum_{j=1}^{n}e^{logits_{i,j}}}$

类似于Softmax(某元素)=某元素/这一行的和

logits: （第一个参数）
A Tensor. Must be one of the following types: float32, float64. 2-D with shape [batch_size, num_classes]. 一个Tensor对象。必须是float32或float64类型的对象。必须是2维的，每一行是一个样本，第一列是一类。
returns:（返回值）
A Tensor. Has the same type as logits. Same shape as logits. 返回与参数类型一样、形状一样的Tensor对象。

返回值赋给y。

PART 3

y_ = tf.placeholder("float", [None, 10])  # 输入正确值
#  交叉熵
cross_entropy = -tf.reduce_sum(y_ * tf.log(y))  # reduce_sum 求和
# 在语法层面上:先构造一个梯度下降优化器对象，然后调用改对象的minimize方法，参数：A Tensor containing the value to minimize
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)  # 指定训练方法

第一行

y_是一个用来填充成样本代表的答案的占位符。高度为None表示不知道有几个样本，宽度为10表示单点数据的宽度为10。

第三行

reduce_sum理解为sum即可。y_是样本的真实答案，y是样本的预测答案。交叉熵= $\sum y1 \cdot \log y$
这里用的乘是×，应该是按元素相乘。矩阵相乘用上面的matmul。

第五行

构造一个梯度下降优化器，并执行minimize方法。

minimize的文档

Add operations to minimize loss by updating var_list. 通过更新变量列表使误差最小化。

returns:
An Operation that updates the variables in var_list. If global_step was not None, that operation also increments global_step.

Operation是一种新类型。上面的W和b都是Variable类型的变量。

PART 4

# 开始设置值
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for i in range(1000):
    [batch_xs, batch_ys] = mnist.train.next_batch(100)    # 随机选取100个数据
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

第二行

initialize_all_variables的文档

Returns an Op that initializes all variables. 返回一个初始化所有变量的操作
This is just a shortcut for initialize_variables(all_variables()) 这是一个函数的快捷方式。

returns: 返回：
An Op that initializes all variables in the graph. 一个操作Op。

第三行

开始回话。

第四行

执行操作。

第五行

range的文档

range(stop) -> list of integers range(start, stop[, step]) -> list of integers

Return a list containing an arithmetic progression(一系列、发展，此处可能是增长) of integers. range(i, j) returns [i, i+1, i+2, ..., j-1]; start (!) defaults to 0. When step is given, it specifies the increment (or decrement). For example, range(4) returns [0, 1, 2, 3]. The end point is omitted(遗漏的)! These are exactly the valid indices for a list of 4 elements.

第六行

从训练数据中随机选取100个数据.

第七行

运行。能运行的都是Operation类型的变量。

# 返回 bool变量风格的tensor对象
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))    # 这里的y代表的SOFTMAX这一长串，类似MATLAB的符号运算
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))     # 把bool转成float求平均值
print sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})

第二行

tf.argmax(y, 1)表示返回y矩阵每一行的最大元素所在未知，构成一个新矩阵。y是预测值，y_是真值。检查y形成的矩阵每一个元素和y_形成的矩阵的每一个元素是否相等，这个结构构成了一个布尔矩阵。

第三行

将上面的布尔矩阵的每个值都强制转换成float类型，然后求平均值。

第四行

求出正确率。这里feed的值是检验部分的数据。

总结

# tensorflow也是在Python外部完成其主要工作，但是进行了改进以避免这种开销。

# 它并没有采用在Python外部独立运行某个耗时操作的方式，而是先让我们描述一个交互操作图，

# 然后完全将其运行在Python外部

# 因此Python代码的目的是用来构建这个可以在外部运行的计算图，以及安排计算图的哪一部分应该被运行。