TensorFlow实现Mnist数据集的多分类逻辑回归模型

个人网站：   文艺与Code | Thinkgamer的博客
CSDN博客：Thinkgamer技术专栏
知乎：          Thinkgamer
微博：          Thinkgamer的微博
GitHub：     Thinkgamer的GitHub
微信公众号：数据与算法联盟（DataAndAlgorithm）

多分类逻辑回归基于逻辑回归（Logistic Regression，LR）和softMax实现，其在多分类分类任务中应用广泛，本篇文章基于tf实现多分类逻辑回归，使用的数据集为Mnist。

多分类逻辑回归的基础概要和在Spark上的实现可参考：

本篇文章涉及到的tf相关接口函数及释义如下：

tf.nn.softmax

Softmax 在机器学习和深度学习中有着非常广泛的应用。尤其在处理多分类（C > 2）问题，分类器最后的输出单元需要Softmax 函数进行数值处理。关于Softmax 函数的定义如下所示：

$S_i=\frac{e^{V_i}}{\sum_i^Ce^{V_i}}$
其中，Vi 是分类器前级输出单元的输出。i 表示类别索引，总的类别个数为 C。Si 表示的是当前元素的指数与所有元素指数和的比值。Softmax 将多分类的输出数值转化为相对概率，更容易理解和比较。我们来看下面这个例子。

一个多分类问题，C = 4。线性分类器模型最后输出层包含了四个输出值，分别是：
$V=\left[ \begin{matrix} -3 \\ 2 \\ -1 \\ 0 \end{matrix} \right]$
经过Softmax处理后，数值转化为相对概率：
$S=\left[ \begin{matrix} 0.0057 \\ 0.8390 \\ 0.0418 \\ 0.1135 \end{matrix} \right]$
很明显，Softmax 的输出表征了不同类别之间的相对概率。我们可以清晰地看出，S1 = 0.8390，对应的概率最大，则更清晰地可以判断预测为第1类的可能性更大。Softmax 将连续数值转化成相对概率，更有利于我们理解。

实际应用中，使用 Softmax 需要注意数值溢出的问题。因为有指数运算，如果 V 数值很大，经过指数运算后的数值往往可能有溢出的可能。所以，需要对 V 进行一些数值处理：即 V 中的每个元素减去 V 中的最大值。
$D=max(V)$ $S_i=\frac{e^{V_i-D}}{\sum_i^Ce^{V_i-D}}$

tf中的softmax函数接口为：

tf.nn.softmax(
    logits,
    axis=None,
    name=None,
    dim=None
)

logits：非空张量
axis：将被执行的softmax纬度，默认为None
name：名称，默认为None
dim：axis的别名，默认为None

tf.log()

计算元素的自然对数

tf.reduce_sum()

求和

tf.reduce_mean()

求平均数

tf.argmax()

返回tensor 列表中最大值索引号，eg：


import tensorflow as tf
import numpy as np
 
A = [[1,3,4,5,6]]
B = [[1,3,4], [2,4,1]]
 
with tf.Session() as sess:
    print(sess.run(tf.argmax(A, 1)))
    print(sess.run(tf.argmax(B, 1)))

--------------------- 

[4]
[2 1]

tf.cast()

数据格式转换，eg：

a = tf.Variable([1,0,0,1,1])
b = tf.cast(a,dtype=tf.bool)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
print(sess.run(b))

--------------------- 

[ True False False  True  True]

最终实现代码

# -*- coding: utf-8 -*-
"""
    Author: Thinkgamer
    Date: 2019-02-26
"""
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data


# 加载mnist数据集
mnist = input_data.read_data_sets("./data/mnist", one_hot=True)
train_img = mnist.train.images
train_label = mnist.train.labels
test_img = mnist.test.images
test_label = mnist.test.labels

print("Mnist数据集加载成功！")
print(train_img.shape)
print(train_label.shape)
print(test_img.shape)
print(test_label.shape)
print(train_label[0])

# 构建模型计算图
x = tf.placeholder(dtype=float, shape=[None, 784])
y = tf.placeholder(dtype=float, shape=[None, 10])
w = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

actv = tf.nn.softmax(tf.matmul(x,w) + b)
cost = tf.reduce_mean(-tf.reduce_sum(y * tf.log(actv),reduction_indices=1))
learning_ratio = 0.01
optm = tf.train.GradientDescentOptimizer(learning_ratio).minimize(cost)

# 预测值
pred = tf.equal(tf.argmax(actv, 1), tf.argmax(y, 1))
# 精确值
accr = tf.reduce_mean(tf.cast(pred, "float"))
# 初始化
init = tf.global_variables_initializer()

# 开始训练模型
training_epochs = 50
batch_size      = 100
display_step    = 5

with tf.Session() as sess:
    sess.run(init)
    # 小批量梯度下降算法优化
    for epoch in range(training_epochs):
        avg_cost = 0.
        num_batch = int(mnist.train.num_examples/batch_size)
        for i in range(num_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            sess.run(optm, feed_dict={x: batch_xs, y: batch_ys})
            feeds = {x: batch_xs, y: batch_ys}
            avg_cost += sess.run(cost, feed_dict=feeds)/num_batch

        # DISPLAY
        if epoch % display_step == 0:
            feeds_train = {x: batch_xs, y: batch_ys}
            feeds_test = {x: mnist.test.images, y: mnist.test.labels}
            train_acc = sess.run(accr, feed_dict=feeds_train)
            test_acc = sess.run(accr, feed_dict=feeds_test)
            print ("Epoch: %03d/%03d cost: %.9f train_acc: %.3f test_acc: %.3f"% (epoch, training_epochs, avg_cost, train_acc, test_acc))
    print ("DONE")

运行代码输出为：

Mnist数据集加载成功！
(55000, 784)
(55000, 10)
(10000, 784)
(10000, 10)
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]

Epoch: 000/050 cost: 1.176693809 train_acc: 0.890 test_acc: 0.856
Epoch: 005/050 cost: 0.440978021 train_acc: 0.870 test_acc: 0.895
Epoch: 010/050 cost: 0.383333536 train_acc: 0.900 test_acc: 0.904
Epoch: 015/050 cost: 0.357253272 train_acc: 0.970 test_acc: 0.909
Epoch: 020/050 cost: 0.341492714 train_acc: 0.910 test_acc: 0.913
Epoch: 025/050 cost: 0.330541673 train_acc: 0.920 test_acc: 0.915
Epoch: 030/050 cost: 0.322343116 train_acc: 0.910 test_acc: 0.915
Epoch: 035/050 cost: 0.315980227 train_acc: 0.900 test_acc: 0.917
Epoch: 040/050 cost: 0.310762640 train_acc: 0.910 test_acc: 0.918
Epoch: 045/050 cost: 0.306386642 train_acc: 0.900 test_acc: 0.919
DONE

Process finished with exit code 0

从最终的结果可以看出：

loss一直在下降，说明模型是拟合的
训练集和测试机的精确率都很高，模型效果表现不错
当然也可以通过调节batch_size，epochs等参数进行效果调优