机器学习与TensorFlow编程（3）Softmax回归

0. 参考资料

1. 总体介绍

Softmax Regression模型是Logistic Regression在多分类问题上的推广。
总体思路：
- 假设一共有m个训练样本，输入一共n个属性，输出分类共有K类。
- 输入矩阵X为m*(n+1)维矩阵。
- 参数矩阵 $\ \theta \$ 为(n+1)*K维矩阵。 $\ \theta_j \$ 为(n+1)*1维向量，代表取第j类的参数向量。
- 输出矩阵 $\ h_\theta(x)\$ 与样本结果y均为m*K维矩阵。
- 输入矩阵、参数矩阵通过Softmax函数，转换为输出矩阵。输出矩阵 $\ h_\theta(x)\$ 代表每个分类的概率。
- 通过概率获取代价函数： −log(hθ(x))
  - 概率接近1时，函数取值很小。
  - 概率接近0时，函数取值很大。
- 求导获取参数每次的变化量，并通过梯度下降等优化算法获取结果。

2. Softmax函数

本函数很高端，是Logistic函数的一般形式：
- Softmax函数将取值范围在 $(-\infty, +\infty)$ 上的的K维向量，转换为取值范围在 $(0,1]$ 的K维向量。
- 具体参考Wiki与知乎。
函数形式，将K维向量 $\theta x$ 转换为K维向量 $h_\theta(x)$
$h θ (x) = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ P (y = 1 | x; θ) P (y = 2 | x; θ) ⋮ P (y = K | x; θ) ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ = 1 \sum K j = 1 e x p ( θ T j x ) ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ e x p (θ T 1 x) e x p (θ T 2 x) ⋮ e x p (θ T K x) ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥$ $h_\theta(x) = \left[ \begin{matrix} P(y=1|x;\theta) \\ P(y=2|x;\theta) \\ \vdots \\ P(y=K|x;\theta) \\ \end{matrix} \right] = \frac{1}{\sum_{j=1}^Kexp(\theta^{T}_{j}x)} \left[ \begin{matrix} exp(\theta^{T}_1x) \\ exp(\theta^{T}_2x) \\ \vdots \\ exp(\theta^{T}_Kx) \end{matrix} \right]$

3. 代价函数（Loss Function）

函数形式
$J (θ) = - 1 m ⎡ ⎣ \sum i = 1 m \sum j = 1 K 1 {y (i) = j} log e x p ( θ T j x ( i ) ) \sum K l = 1 e x p ( θ T l x ( i ) ) ⎤ ⎦$ $J(\theta) = -\frac{1}{m} \left[ \sum_{i=1}^m \sum_{j=1}^K1\{y^{(i)}=j\}\log \frac{exp(\theta^{T}_jx^{(i)})}{\sum_{l=1}^Kexp(\theta^{T}_{l}x^{(i)})} \right]$
提取Softmax部分
$p (y (i) = j | x (i); θ) = e x p ( θ T j x ( i ) ) \sum K l = 1 e x p ( θ T l x ( i ) )$ $p(y^{(i)}=j|x^{(i)};\theta) = \frac{exp(\theta^{T}_jx^{(i)})}{\sum_{l=1}^Kexp(\theta^{T}_{l}x^{(i)})}$
经过求导（比较复杂，请参考文章1与文章2），得到(n+1)维向量
$\nabla j J (θ) = - 1 m \sum i = 1 m [x (i) (1 {y (i) = j} - p (y (i) = j | x (i); θ))]$ $\nabla_jJ(\theta) = -\frac{1}{m}\sum_{i=1}^m \left[ x^{(i)}(1\{y^{(i)}=j\}-p(y^{(i)}=j|x^{(i)};\theta)) \right]$
其他：
- 代价函数的本质是交叉熵代价函数（Cross-entropy cost function）。

4. TensorFlow代码

Softmax Regression是TensorFlow Tutorial中的一部分，具体请参考这里。
以下代码是我自己手写的，参考了例程。
数据来源：[mnist dataset][10]
Softmax Regression

import tensorflow as tf
import numpy as np
import mnist_dataset as md  # 自己写的代码，可以参考Github源码

# 获取训练数据
train_images = md.load_train_images_file()  # 60000*784
train_labels = md.load_train_labels_file()  # 60000*10
train_images_2 = np.concatenate((np.zeros([60000, 1]), train_images), 1)

# 获取测试数据
test_images = md.load_test_images_file()  # 10000*784
test_labels = md.load_test_labels_file()  # 10000*10
test_images_2 = np.concatenate((np.zeros([10000, 1]), test_images), 1)

# 构建Softmax Regression模型
x = tf.placeholder(tf.float32, [None, 785])
y_ = tf.placeholder(tf.float32, [None, 10])
W = tf.Variable(tf.zeros([785, 10]))
y = tf.matmul(x, W)

# 构建代价函数
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

# 设置优化算法参数
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# 建立TensorFlow Session，并初始化所有参数
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()

# 梯度下降算法
for i in range(1000):
  sess.run(train_step, feed_dict={x: train_images_2, y_: train_labels})

# 使用测试数据，判断模型的准确性
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: test_images_2,
                                    y_: test_labels}))
# 测试结果为0.9202
# 运行了7分钟

[10/p/ffa51250ba2e

机器学习与TensorFlow编程（3）Softmax回归

0. 参考资料

1. 总体介绍

2. Softmax函数

3. 代价函数（Loss Function）

4. TensorFlow代码

猜你喜欢