引入

和线性回归不同，softmax回归的输出单元从一个变成了多个，且引入了softmax运算使输出更适合离散值的预测和训练。
完整代码已上传至Github:
https://github.com/InkiInki/Python/blob/master/Python1/deepLearning/SoftMaxRegression.py
https://github.com/InkiInki/Python/blob/master/Python1/deepLearning/TorchSoftmaxRegression.py

1.1 分类问题

考虑一个简单的图像分类问题，其输入图像的高和宽均为2像素，且色彩为灰度。这样每个像素值都可以用一个标量表示。
将图像的4像素分别记为 $x_1, x_2, x_3, x_4$ 。假设训练数据集中图像的真实标签为狗狗、喵喵和小鸡(假设可以用4像素表示出这3中动物)，那么这些标签分别对应离散值 $y_1, y_2,y_3$ 。

1.2 softmax回归模型

softmax回归与线性回归一样将输入特征与权重做线性叠加。不同点在于，softmax回归的输出值个数等于标签里的类别数。因为一共有4中特征和3中输出类别，所有权重包含12个标量、偏差包含3个标量，且对每个输入计算 $o_1, o_2, o_3$ 这3个输出：
$o_i = \sum_{j=1}^{4}x_jw_{ji}+b_i \tag{1}$
下图描绘了式(1)的计算：
在这里插入图片描述
图片来源：李沐、Aston Zhang等老师的这本《动手学深度学习》一书

既然分类问题需要得到离散的预测输出，一共简单的方法是将输出值 $o_i$ 当做预测类别是 $i$ 的置信度，并将值最大的输出所对应的类作为预测输出，即输出 $arg \max \limits_i o_i$ 。例如，如果 $o_1, o_2, o_3$ 分别为 $0.1, 10, 0.1$ ，那么预测类别为2。
然而，直接使用输出层的输出有两个问题：
1）输出层的输出值范围不确定；
2）真实标签与不确定范围输出值之间的误差难以衡量。

softmax运算符解决了以上两个问题，其通过下式将输出值变化为值为正且和为1的概率分布：
$\hat{y}_1, \hat{y}_2, \hat{y}_3 = softmax(o_1, o_2, o_3) \tag{2}$ 其中
$\hat{y}_1 = \frac{exp(o_1)}{\sum_{i = 1}^{3}exp(o_i)}, \hat{y}_2 = \frac{exp(o_2)}{\sum_{i = 1}^{3}exp(o_i)}, \hat{y}_3 = \frac{exp(o_3)}{\sum_{i = 1}^{3}exp(o_i)}$
容易看出， $\hat{y}_1 + \hat{y}_2 + \hat{y}_3 = 1$ ，且 $0 \leq \hat{y}_1, \hat{y}_2, \hat{y}_3 \leq 1$ ，为一个合法的概率分布。

1.3 单样本分类的矢量计算表达式

依然立足与先前的图像分类问题，假设softmax回归的权重和偏差参数如下：
$\boldsymbol{W} = \left[ \begin{matrix} w_{11} & w_{12} & w_{13}\\ w_{21} & w_{22} & w_{23} \\ w_{31} & w_{32} & w_{33}\\ w_{41} & w_{42} & w_{43} \end{matrix} \right], \boldsymbol{b} = \left[ \begin{matrix} b_1, b_2, b_3\\ \end{matrix} \right] \tag{3}$
设高和宽分为为2像素的图像样本 $i$ 的特征为：
$x_i = \left[ \begin{matrix} x_{i1}, x_{i2}, x_{i3}, x_{i4}\\ \end{matrix} \right] \tag{4}$
输出层的输出为：
$o_i = \left[ \begin{matrix} o_{i1}, o_{i2}, o_{i3}\\ \end{matrix} \right] \tag{5}$
预测为狗狗、喵喵、小鸡的概率分布为：
$\hat{y}_i = \left[ \begin{matrix} \hat{y}_{i1}, \hat{y}_{i2}, \hat{y}_{i3}\\ \end{matrix} \right] \tag{6}$
矢量表达计算式为：
$\begin{matrix} \boldsymbol{o}_i = \boldsymbol{x}_i \boldsymbol{W} + \boldsymbol{b,}\\ \boldsymbol{ \hat{y}}_i = softmax(\boldsymbol{o}_i )\\ \end{matrix} \tag{7}$

1.4 小批量样本分类的矢量计算表达式

为了进一步提升计算效率，通常对小批量数据做矢量计算。给定一个小批量样本，大小为n，特征数为d，类别数为q。设批量特征 $\boldsymbol{X} \in \boldsymbol{R}^{n×d}$ 。假设softmax回归的权重和偏差分别为 $\boldsymbol{W} \in \boldsymbol{R}^{d×q}$ 和 $\boldsymbol{b} \in \boldsymbol{R}^{1×q}$ ，则softmax回归的矢量计算表达式为，其中 $\boldsymbol{O}, \boldsymbol{ \hat{Y}} \in \boldsymbol{R}^{n×q}$ ：
$\begin{matrix} \boldsymbol{O} = \boldsymbol{X} \boldsymbol{W} + \boldsymbol{b,}\\ \boldsymbol{ \hat{Y}}= softmax(\boldsymbol{O} )\\ \end{matrix} \tag{8}$

1.5 交叉熵损失函数

我们可以像线性回归那样使用平方损失函数 $||\hat{y}_i - y_i||^2 / 2$ 。然而，想要预测结果正确，并不需要预测概率完全等于标签概率。例如，在图像分类的例子里，如果 $y_i = 3$ ，那么只需要 $\hat{y}_{i3}$ 比其他的预测值大即可。而平方损失则过于严格，改善的方法是使用更适合衡量两个概率分布差异的测量函数。其中，交叉熵(cross entropy)是一个常用的衡量方法：
$H(\boldsymbol{y}_i, \boldsymbol{\hat{y}}_i) = -\sum^q_{j = 1}({y}_{ij}log { \hat{y}}_{ij}) \tag{9}$ 其中 $y_{ij} \in [0, 1]$ ，且向量 $\boldsymbol{y}_i$ 中只有第 $y_{i}$ 个元素 $y_{iy_i}$ 为1，其余全为0，注意与样本 $i$ 类别的离散值区分开。
交叉熵只关心对正确类别的预测概率，因为只要其值足够大，就可以确保分类结果正确。当然，遇到一个样本有多个标签时，例如图像里不止一个物体时，并不能做这一步简化。
假设训练数据集的样本数为 $n$ ，交叉是熵损失函数定义为：
$\ell ( \mathbf{ \Theta}) = \frac{1}{n} \sum_{i = 1}^n H(\boldsymbol{y}_i, \boldsymbol{\hat{y}}_i) \tag{10}$ 其中 $\mathbf{ \Theta}$ 代表模型参数。

1.6 模型预测及评价

通常，我们把预测概率最大的类别作为输出类别。如果它与真实类别一致，说明这次预测是正确的。

2 具体实现

所需导入的模块如下：

import torch
import torchvision
import numpy as np
import sys
from ImageMnist import *

其中ImageMnist的gitHub地址为：
https://github.com/InkiInki/Python/blob/master/Python1/deepLearning/ImageMnist.py

2.1 获取和读取数据

使用的数据集为FashionMNIST，相关介绍可以参照：
https://blog.csdn.net/weixin_44575152/article/details/104678779

class SoftMaxRegression():
    
    def __init__(self, batch_size=256):
        self.batch_size = batch_size
        
if __name__ == '__main__':
    im = ImageMnist()
    train_iter, test_iter = im.mnist_train, im.mnist_test
    print(len(train_iter))

2.2 初始化模型参数

已知每个样本输入是高和宽均为28像素的图像，则模型输入向量长度为 $28 × 28 = 784$ ：该向量的每个元素对用图像中的每个像素；图像有十个类别，则单层神经网络输出层的输出个数为 $10$ 。因此，softmax回归的权重和偏差参数分别为 $784 × 10$ 和 $1 × 10$ 的矩阵。
修改原代码如下：

	def __init__(self, num_inputs=784, num_outputs=10, batch_size=256):
	        self.num_inputs = num_inputs
	        self.num_outputs = num_outputs
	        self.batch_size = batch_size
	        self.w = torch.tensor(np.random.normal(0, 0.01, (self.num_inputs, self.num_outputs)), 
	                              dtype=torch.float, requires_grad=True)
	        self.b = torch.zeros(self.num_outputs, dtype=torch.float, requires_grad=True)

2.3 实现softmax计算

首先介绍如何对多维Tensor按维度操作：

if __name__ == '__main__':
    x = torch.tensor([[1, 2, 3], [4, 5, 6]])
    print(x.sum(dim=0, keepdim=True))
    print(x.sum(dim=1, keepdim=True))

运行结果：

tensor([[5, 7, 9]])
tensor([[ 6],
        [15]])

以下定义softmax运算：

	...
    def softmax(self, x):
        x_exp = x.exp()
        partition = x_exp.sum(dim=1, keepdim=True)
        return x_exp / partition
        
if __name__ == '__main__':
    test = SoftMaxRegression()
    x = torch.rand((2, 5))
    x_prob = test.softmax(x)
    print(x_prob, x_prob.sum(dim=1))

运行结果：

tensor([[0.2122, 0.2135, 0.2442, 0.1720, 0.1581],
        [0.1578, 0.2423, 0.2119, 0.1472, 0.2407]]) tensor([1., 1.])

可以看出对于随机输入，我们将每个元素变成了非负数，且每一行的和为1。

2.4 定义模型

有了softmax运算，便可以定义softmax回归模型，这里通过view将每张原始图像改为长度为num_inputs的向量：

def net(self, x):
        return self.softmax(torch.mm(x.view((-1, self.num_inputs)), self.w) + self.b)

2.5 定义损失函数

之前介绍了softmax回归使用的交叉熵损失函数。为了得到标签的预测概率，可以使用gather函数：

if __name__ == '__main__':
    test = SoftMaxRegression()
    y_hat = torch.tensor([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
    y = torch.LongTensor([0, 2])
    print(y_hat.gather(1, y.view(-1, 1)))

运行结果：

tensor([[0.1000],
        [0.5000]])

以下实现交叉熵损失函数：

    def cross_entropy(self, y_hat, y):
        return - torch.log(y_hat.gather(1, y.view(-1, 1)))

2.6 计算分类准确率

给定一个类别的预测概率分布y_hat，则把预测概率最大的类别作为输出类别。如果他与真实类别y一致，说明这次预测是正确的。分类准确率即正确预测数量与总预测数量之比：

	...
    def accuracy(self, y_hat, y):
        return (y_hat.argmax(dim=1) == y).float().mean().item()
    # argmax(dim=1)返回y_hat中最大元素的索引
    # float()将值转换为0或1的浮点型Tensor

if __name__ == '__main__':
    test = SoftMaxRegression()
    y_hat = torch.tensor([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
    y = torch.LongTensor([0, 2])
    y_hat.gather(1, y.view(-1, 1))
    print(test.accuracy(y_hat, y))

运行结果：

0.5

类似地，net在data_iter上的准确率评价如下：

	...
    def evaluate_accuracy(self, data_iter, net):
        acc_sum, n = 0., 0
        for x, y in data_iter:
            acc_sum += (net(x).argmax(dim=1) == y).float().sum().item()
            n += y.shape[0]
        return acc_sum / n
        
if __name__ == '__main__':
    test = SoftMaxRegression()
    im = ImageMnist()
    im.__init__()
    train_iter, test_iter = im.data_iter(test.batch_size)
    print(test.evaluate_accuracy(test_iter, test.net))

运行结果：

0.0994

这里由于随机初始化了模型，所以准确率低。

2.7 训练模型

实现过程于线性回归很相似：

	...
    def train_softmax(self, train_iter, test_iter, num_epochs, 
                      params=None, lr=None, optimizer=None):
        for epoch in range(num_epochs):
            train_l_sum, train_acc_sum, n = 0., 0., 0
            for x, y in train_iter:
                y_hat = self.net(x)
                l = self.cross_entropy(y_hat, y).sum()
                
                # 梯度清零
                if optimizer is not None:
                    optimizer.zero_grad()
                elif params is not None and params[0].grad is not None:
                    for param in params:
                        param.grad.data.zero_()
                        
                l.backward()
                if optimizer is None:
                    self.sgd(params, lr)
                else:
                    optimizer.step()    #softmax简介实现中将用到
                    
                train_l_sum += l.item()
                train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
                n += y.shape[0]
            test_acc = self.evaluate_accuracy(test_iter, self.net)
            print("Epoch %d, loss %.4f, train acc %.3f, test acc %.3f" %
                  (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))
            
    def sgd(self, params, lr):
        for param in params:
            param.data -= lr * param.grad / self.batch_size
        
if __name__ == '__main__':
    test = SoftMaxRegression()
    im = ImageMnist()
    im.__init__()
    train_iter, test_iter = im.data_iter(test.batch_size)
    test.train_softmax(train_iter, test_iter, 5, [test.w, test.b], 0.1)

运行结果：

Epoch 1, loss 0.7851, train acc 0.749, test acc 0.794
Epoch 2, loss 0.5725, train acc 0.811, test acc 0.812
Epoch 3, loss 0.5255, train acc 0.825, test acc 0.819
Epoch 4, loss 0.5010, train acc 0.833, test acc 0.825
Epoch 5, loss 0.4847, train acc 0.837, test acc 0.829

2.8 预测

训练完成后，给定一系列图像，就可以比较其真实标签和模型预测结果：

if __name__ == '__main__':
    test = SoftMaxRegression()
    im = ImageMnist()
    im.__init__()
    train_iter, test_iter = im.data_iter(test.batch_size)
    x, y = iter(test_iter).next()
    
    true_labels = im.get_text_labels(y.numpy())
    pred_labels = im.get_text_labels(test.net(x).argmax(dim=1).numpy())
    titles = [true + '\n' + pred for true, pred in zip(true_labels, pred_labels)]
    im.show_mnist(x[ : 9], titles[ : 9])

运行结果：
在这里插入图片描述

2.9 完整代码

‘’’
@(#)SoftMax.py
The class of SoftMax.
Author: inki
Email: [email protected]
Created on May 06, 2020
Last Modified on May 08, 2020
‘’’
import torch
import torchvision
import numpy as np
import sys
import torch.optim as optim
from ImageMnist import *

class SoftMaxRegression():

def __init__(self, num_inputs=784, num_outputs=10, batch_size=256):
    self.num_inputs = num_inputs
    self.num_outputs = num_outputs
    self.batch_size = batch_size
    self.w = torch.tensor(np.random.normal(0, 0.01, (self.num_inputs, self.num_outputs)), 
                          dtype=torch.float, requires_grad=True)
    self.b = torch.zeros(self.num_outputs, dtype=torch.float, requires_grad=True)
    
def softmax(self, x):
    x_exp = x.exp()
    partition = x_exp.sum(dim=1, keepdim=True)
    return x_exp / partition

def net(self, x):
    return self.softmax(torch.mm(x.view((-1, self.num_inputs)), self.w) + self.b)

def cross_entropy(self, y_hat, y):
    return - torch.log(y_hat.gather(1, y.view(-1, 1)))

def accuracy(self, y_hat, y):
    return (y_hat.argmax(dim=1) == y).float().mean().item()

def evaluate_accuracy(self, data_iter, net):
    acc_sum, n = 0., 0
    for x, y in data_iter:
        acc_sum += (net(x).argmax(dim=1) == y).float().sum().item()
        n += y.shape[0]
    return acc_sum / n

def train_softmax(self, train_iter, test_iter, num_epochs, 
                  params=None, lr=None, optimizer=None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0., 0., 0
        for x, y in train_iter:
            y_hat = self.net(x)
            l = self.cross_entropy(y_hat, y).sum()
            
            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()
                    
            l.backward()
            if optimizer is None:
                self.sgd(params, lr)
            else:
                optimizer.step()
                
            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
        test_acc = self.evaluate_accuracy(test_iter, self.net)
        print("Epoch %d, loss %.4f, train acc %.3f, test acc %.3f" %
              (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))
        
def sgd(self, params, lr):
    for param in params:
        param.data -= lr * param.grad / self.batch_size

if name == ‘main’:
test = SoftMaxRegression()
im = ImageMnist()
im.init()
train_iter, test_iter = im.data_iter(test.batch_size)
x, y = iter(test_iter).next()

true_labels = im.get_text_labels(y.numpy())
pred_labels = im.get_text_labels(test.net(x).argmax(dim=1).numpy())
titles = [true + '\n' + pred for true, pred in zip(true_labels, pred_labels)]
im.show_mnist(x[ : 9], titles[ : 9])

3 torch实现

首先导入以下包：

import torch
from torch import nn
from torch.nn import init
import numpy as np
import sys
from ImageMnist import *

3.1 获取和读取数据

if __name__ == '__main__':
    im = ImageMnist()
    im.__init__()
    train_iter, test_iter = im.data_iter(test.batch_size)

3.2 定义和初始化模型

class TrochSoftmaxRegression(nn.Module):
    
    def __init__(self, num_inputs=784, num_outputs=10):
        super(TrochSoftmaxRegression, self).__init__()
        self.num_inputs = num_inputs
        self.num_outputs = num_outputs
        self.linear = nn.Linear(self.num_inputs, self.num_outputs)
        
    def forward(self, x):    # x-->image mnist shape: (batch, 1, 28, 28)
        y = self.linear(x.view(x.shape[0], -1))
        return y
        
if __name__ == '__main__':
    batch_size = 256
    num_inputs=784
    num_outputs=10
    im = ImageMnist()
    im.__init__()
    train_iter, test_iter = im.data_iter(batch_size)
    net = TrochSoftmaxRegression(num_inputs=784, num_outputs=10)
    print(net)

输出如下：

TrochSoftmaxRegression(
  (linear): Linear(in_features=784, out_features=10, bias=True)
)

为方便形状转换，定义以下：

class FlattenLayer(nn.Module):
    
    def __init__(self):
        super(FlattenLayer, self).__init__()
        
    def forward(self, x):
        return x.view(x.shape[0], -1)

这样便可方便的定义模型并初始化参数：

import warnings
warnings.filterwarnings('ignore')
from collections import OrderedDict

if __name__ == '__main__':
    net = nn.Sequential(
        OrderedDict([
            ('flatten', FlattenLayer()),
            ('linear', nn.Linear(num_inputs, num_outputs))])
        )
    init.normal(net.linear.weight, mean=0, std=0.01)
    init.constant_(net.linear.bias, val=0)
    print(net)

输出如下：

Sequential(
  (flatten): FlattenLayer()
  (linear): Linear(in_features=784, out_features=10, bias=True)
)

3.3 交叉熵损失函数

	loss = nn.CrossEntropyLoss()

3.4 定义优化算法

optimizer = torch.optim.SGD(net.parameters(), lr=0.1)

3.5 完整代码

'''
@(#)SoftMax.py
The class of SoftMax.
Author: inki
Email: [email protected]
Created on May 08, 2020
Last Modified on May 08, 2020
'''
import warnings
warnings.filterwarnings('ignore')
import torch
from torch import nn
from torch.nn import init
import numpy as np
import sys
from ImageMnist import *
from collections import OrderedDict
        
def train_softmax(net, train_iter, test_iter, loss, num_epochs, 
                      batch_size, params=None, lr=None, optimizer=None):
        for epoch in range(num_epochs):
            train_l_sum, train_acc_sum, n = 0., 0., 0
            for x, y in train_iter:
                y_hat = net(x)
                l = loss(y_hat, y).sum()
                
                if optimizer is not None:
                    optimizer.zero_grad()
                elif params is not None and params[0].grad is not None:
                    for param in params:
                        param.grad.data.zero_()
                        
                l.backward()
                if optimizer is None:
                    sgd(params, lr, batch_size)
                else:
                    optimizer.step()
                    
                train_l_sum += l.item()
                train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
                n += y.shape[0]
            test_acc = evaluate_accuracy(test_iter, net)
            print("Epoch %d, loss %.4f, train acc %.3f, test acc %.3f" %
                  (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))
            
def sgd(params, lr, batch_size):
    for param in params:
        param.data -= lr * param.grad / batch_size
        
def evaluate_accuracy(data_iter, net):
    acc_sum, n = 0., 0
    for x, y in data_iter:
        acc_sum += (net(x).argmax(dim=1) == y).float().sum().item()
        n += y.shape[0]
    return acc_sum / n
    
class FlattenLayer(nn.Module):
    
    def __init__(self):
        super(FlattenLayer, self).__init__()
        
    def forward(self, x):
        return x.view(x.shape[0], -1)        

if __name__ == '__main__':
    batch_size = 256
    num_inputs=784
    num_outputs=10
    num_epochs = 5
    im = ImageMnist()
    im.__init__()
    train_iter, test_iter = im.data_iter(batch_size)
    net = nn.Sequential(
        OrderedDict([
            ('flatten', FlattenLayer()),
            ('linear', nn.Linear(num_inputs, num_outputs))])
        )
    init.normal(net.linear.weight, mean=0, std=0.01)
    init.constant_(net.linear.bias, val=0)
    loss = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(net.parameters(), lr=0.1)
    train_softmax(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, optimizer)

因吉

原创文章 35 获赞 44 访问量 8629

关注私信

深度学习（五）：SOFTMAX回归

文章目录

引入

1.1 分类问题

1.2 softmax回归模型

1.3 单样本分类的矢量计算表达式

1.4 小批量样本分类的矢量计算表达式

1.5 交叉熵损失函数

1.6 模型预测及评价

2 具体实现

2.1 获取和读取数据

2.2 初始化模型参数

2.3 实现softmax计算

2.4 定义模型

2.5 定义损失函数

2.6 计算分类准确率

2.7 训练模型

2.8 预测

2.9 完整代码

3 torch实现

3.1 获取和读取数据

3.2 定义和初始化模型

3.3 交叉熵损失函数

3.4 定义优化算法

3.5 完整代码

猜你喜欢