Hands-on learning deep learning 09----Softmax regression + loss function + image classification data set

Image Classification Dataset

The data set used in the course is FashionMNIST
first look at how to download and use this data set:

import torch
import torchvision
from torch.utils import data
from torchvision import transforms
import matplotlib.pyplot as plt
# 定义一个对图像的操作  转化为Tensor类型
trans = transforms.ToTensor()
# root代表数据集存放路径 train代表训练集还是测试集  transform 对图像的处理 download是否下载
# 训练集
mnist_train = torchvision.datasets.FashionMNIST(
        root="./data", train=True,transform=trans,download=True)
# 测试集
mnist_test = torchvision.datasets.FashionMNIST(
        root="./data", train=False,transform=trans, download=True)
# 输出下长度 看下
print(len(mnist_train))
print(len(mnist_test))
# 写两个函数  展示一下这个数据集
def get_fashion_mnist_labels(labels):
    """返回Fashion-MNIST数据集的文本标签。"""
    text_labels = [
        't-shirt', 'trouser', 'pullover', 'dress', 'coat', 'sandal', 'shirt',
        'sneaker', 'bag', 'ankle boot']
    return [text_labels[int(i)] for i in labels]

def show_images(imgs, num_rows, num_cols, titles=None, scale=1.5):
    """Plot a list of images."""
    figsize = (num_cols * scale, num_rows * scale)
    _, axes = plt.subplots(num_rows, num_cols, figsize=figsize)
    axes = axes.flatten()
    for i, (ax, img) in enumerate(zip(axes, imgs)):
        if torch.is_tensor(img):
            ax.imshow(img.numpy())
        else:
            ax.imshow(img)
        ax.axes.get_xaxis().set_visible(False)
        ax.axes.get_yaxis().set_visible(False)
        if titles:
            ax.set_title(titles[i])
    plt.show()
    return axes

X, y = next(iter(data.DataLoader(mnist_train, batch_size=18)))
show_images(X.reshape(18, 28, 28), 2, 9, titles=get_fashion_mnist_labels(y))

Show the results: This is a dataset about clothes
insert image description here

softmax classification

softmax

The obvious difference between a regression problem and a classification problem is that a regression problem predicts a value, such as a house price. The classification problem is to predict which category it belongs to?
Statisticians long ago invented a simple way to represent categorical data: one-hot encoding. A one-hot encoding is a vector that has as many components as classes. The component corresponding to the category is set to 1, and all other components are set to 0. In our example, the label y will be a three-dimensional vector where {1,0,0} corresponds to "cat", {0,1,0} corresponds to "chicken", and {0,0,1} corresponds to "dog".
We hope that the model can output three results for one picture (assuming there are only three categories). Each result represents the probability that the picture belongs to this category.
Then we choose the category with the highest probability as our prediction result.
Suppose we still use a linear model with 4 features and a predicted category of 3. It's just that we now have a set of inputs and require multiple outputs. The formula is as follows:
insert image description here

We can convert this formula into a vector form: $o = W x + b$ , W is a 3x4 matrix.
we calculated in this way $o$ does not meet our requirements, because we hope that the three output values are the predicted probabilities of the three categories, so since it is a probability, at least two conditions must be met: 1, greater than 0, and cannot be negative. 2. The sum is equal to 1.
Therefore, the unnormalized prediction cannot be directly regarded as our output. Here we will introduce our softmax function.
The softmax function transforms unnormalized predictions into non-negative numbers that sum to 1 while keeping the model differentiable.
We first exponentiate each unnormalized prediction, which ensures that the output is non-negative. To ensure that the final output probabilities sum to 1, we then divide each exponentiation result by their sum.
insert image description here
The softmax operation does not change the magnitude order among the unnormalized predictions, it only determines the probability assigned to each class. Therefore, after the softmax operation, we still select the maximum value as our prediction result.
Although softmax is a nonlinear function, the output of softmax regression is still determined by the affine transformation of the input features. Therefore, softmax regression is a linear model.

cross entropy

With softmax, let's talk about cross entropy here.
When we used a linear model to do a single output before, we used the mean square error to measure the difference between the predicted value and the true value.
So we still use the mean square error here to measure the difference between our predicted probability and the real probability? Not
(in the classification problem, use sigmoid/softmx to get the probability, when using the MSE loss function, when using the gradient descent method for learning, there will be a situation where the learning rate is very slow when the model starts training) Here we will talk about
crossover Entropy gone. (You can refer to this article: https://zhuanlan.zhihu.com/p/35709485)
Cross entropy is often used to measure the difference between two probabilities. Assuming that the two probabilities are p and q, the cross entropy is as follows:
insert image description here
Then we talk about this cross Entropy is used as a loss function, y is the real value, $\hat y$ Is our predicted value, the loss function $l(y,\hat y)$

insert image description here

Code

Before I talk about it, let me talk about a few python grammars:
1.

import torch

X = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
# 对列求和 （沿y轴压缩）
print(X.sum(0, keepdim=True))
# 对行求和 （沿x轴压缩）
print(X.sum(1, keepdim=True))

Output:
tensor([[5., 7., 9.]])
tensor([[ 6.],
[15.]])

2、

import torch

# 假设一共有三个类别  猫 狗 鸡
# 提供样本数据y
# 样本y 里面共有两个样本
# 这两个样本的真实类别分别是0 和 2（猫和鸡）
y = torch.tensor([0, 2])
# y_hat是我们对这两个样本的预测 2x3 每行三个元素 代表对该样本属于三个类别的预测概率
y_hat = torch.tensor([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
# 这个语法的意思是访问y_hat里面的元素
# 这个横轴写的是[0,1] 纵轴写的是y  y又等于[0,2]
# x和y给的都是一个列表 所以其实就是依次
# 依次访问 y_hat[0,0] y_hat[1,2]
# 因为y是他们真实的类别
# 所以通过这个语法可以拿出每个样本预测的三个概率中真实值的那个概率
print(y_hat[[0, 1], y])

Output:
tensor([0.1000, 0.5000])

3. In the class video, the teacher implemented it on jupyter. I use pycharm,
so there are two points to pay attention to:
1. Add two sentences to the penultimate line of the add function in the class in d2l.torch , and you can The line graph is displayed. Jupyter will automatically call the image display function every time, so there is no source code, but it can be displayed in jupyter 2, cannot be displayed, you need to add a sentenceClass Animatorplt.draw()plt.pause(0.001)
d2l.show_images()
d2l.plt.show()

Code:

import torch
import torchvision
from torch.utils import data
from torchvision import transforms
from d2l import torch as d2l
import matplotlib.pyplot as plt
from IPython import display

# 定义一个对图像的操作  转化为Tensor类型
trans = transforms.ToTensor()
# root代表数据集存放路径 train代表训练集还是测试集  transform 对图像的处理 download是否下载
# 训练集
mnist_train = torchvision.datasets.FashionMNIST(
        root="./data", train=True,transform=trans,download=True)
# 测试集
mnist_test = torchvision.datasets.FashionMNIST(
        root="./data", train=False,transform=trans, download=True)

# X 是（batch_size，786）
# 这里我们用的数据集是28x28的 通道数为1  所以 数据集展平之后是 784
# 当然展平之前我们一般要经过卷积操作 但是此处只是为了实现softmax 所以直接展平
# 那么我们的输入就是 784  希望输出一个10类别的预测结果
num_inputs = 784
num_outputs = 10
# 所以整个权重矩阵是 784行  10列的矩阵  每一列代表一组权重
# 因为待会W要转置 转置之后就是 10行 784列  列向量转为行向量 一组权重与一组输入相乘
W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True)
# 偏差就是一个长为10的向量
b = torch.zeros(num_outputs, requires_grad=True)
# 学习率
lr = 0.1

def load_data_fashion_mnist(batch_size, resize=None):
    """下载Fashion-MNIST数据集，然后将其加载到内存中。"""
    trans = [transforms.ToTensor()]
    if resize:
        trans.insert(0, transforms.Resize(resize))
    trans = transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(root="./data",train=True,
                                                    transform=trans,download=True)
    mnist_test = torchvision.datasets.FashionMNIST(root="./data",train=False,
                                                   transform=trans,download=True)
    return (data.DataLoader(mnist_train, batch_size, shuffle=True,num_workers=4),
            data.DataLoader(mnist_test, batch_size, shuffle=False,num_workers=4))

# 声明一个类 这个类中初始化为n个变量
# 有两个操作 一个是累加 一个是清零
# 累加就是 输入一个有n个变量的列表，对应累加到自身去
class Accumulator:
    """在`n`个变量上累加。"""
    def __init__(self, n):
        self.data = [0.0] * n
    # 这里是一个累加的效果
    # 比如 传入参数 args=(5,10) 那么5就会累加到data[0]上 10累加到data[1]上
    def add(self, *args):
        self.data = [a + float(b) for a, b in zip(self.data, args)]

    def reset(self):
        self.data = [0.0] * len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]

# 定义一个在动画中绘制数据的实用程序类  后面我们封装好了 放在d2l库里面
class Animator:
    """在动画中绘制数据。"""
    def __init__(self, xlabel=None, ylabel=None, legend=None, xlim=None,
                 ylim=None, xscale='linear', yscale='linear',
                 fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1,
                 figsize=(3.5, 2.5)):
        if legend is None:
            legend = []
        d2l.use_svg_display()
        self.fig, self.axes = d2l.plt.subplots(nrows, ncols, figsize=figsize)

        if nrows * ncols == 1:
            self.axes = [self.axes,]
        self.config_axes = lambda: d2l.set_axes(self.axes[
            0], xlabel, ylabel, xlim, ylim, xscale, yscale, legend)
        self.X, self.Y, self.fmts = None, None, fmts

    def add(self, x, y):
        if not hasattr(y, "__len__"):
            y = [y]
        n = len(y)
        if not hasattr(x, "__len__"):
            x = [x] * n
        if not self.X:
            self.X = [[] for _ in range(n)]
        if not self.Y:
            self.Y = [[] for _ in range(n)]
        for i, (a, b) in enumerate(zip(x, y)):
            if a is not None and b is not None:
                self.X[i].append(a)
                self.Y[i].append(b)
        self.axes[0].cla()
        for x, y, fmt in zip(self.X, self.Y, self.fmts):
            self.axes[0].plot(x, y, fmt)
        self.config_axes()
        d2l.plt.draw()
        d2l.plt.pause(0.001)
        display.display(self.fig)
        display.clear_output(wait=True)
def softmax(X):
    """softmax。"""
    # 对X中所有元素做指数操作
    X_exp = torch.exp(X)
    # X.sum(1,keepdim=True)代表 按行求和
    partition = X_exp.sum(1, keepdim=True)
    # 这里使用广播机制  每个元素值都会除以其对应的partition（该行之和）
    return X_exp / partition
def net(X):
    # XW + b
    return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W) + b)
def cross_entropy(y_hat, y):
    """交叉熵损失。"""
    # 对所有样本的预测概率  拿出其对应的真实值的预测概率
    return -torch.log(y_hat[range(len(y_hat)), y])
def accuracy(y_hat, y):
    """计算预测正确的数量。"""
    # 拿出每一行概率最大的那个  当作预测结果
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
        y_hat = y_hat.argmax(axis=1)
    # 与真实值相比较 得到比较结果 cmp是bool数组
    cmp = y_hat.type(y.dtype) == y
    # 求和  看有多少预测正确
    return float(cmp.type(y.dtype).sum())
def evaluate_accuracy(net, data_iter):
    """计算在指定数据集上模型的精度。"""
    if isinstance(net, torch.nn.Module):
        net.eval() # 将模型转为评估模型（不用计算梯度了）
    metric = Accumulator(2)
    for X, y in data_iter:
        # 传进去两个数值  一个是本批量数据的正确预测个数
        # y.numel()是本批量数据的总数
        metric.add(accuracy(net(X), y), y.numel())
    # 相除得到准确率
    return metric[0] / metric[1]

def train_epoch_ch3(net, train_iter, loss, updater):
    """训练模型一个迭代周期。"""
    if isinstance(net, torch.nn.Module):# 如果我们的net是nn.Module类型的话
        net.train() # 训练模式（计算梯度）
    metric = Accumulator(3)
    for X, y in train_iter: # 遍历数据
        y_hat = net(X) # 计算预测值
        l = loss(y_hat, y) # 计算损失
        if isinstance(updater, torch.optim.Optimizer):# 如果我们的updater是用了pytorch的话
            updater.zero_grad() # 梯度清零
            l.backward() # 计算梯度
            updater.step() # 更新参数
            metric.add( # 记录并累加三个数：损失，正确数量，样本数
                float(l) * len(y), accuracy(y_hat, y),
                y.size().numel())
        else:# 如果我们使用的是自己实现的 交叉熵损失
            l.sum().backward() # 累加 并求梯度
            updater(X.shape[0]) # 更新参数
            metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
    # 对损失求平均  准确数/总数=准确率
    return metric[0] / metric[2], metric[1] / metric[2]
# 训练函数
def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):
    """训练模型。"""
    # 展示数据
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])
    # 每一代训练
    for epoch in range(num_epochs):
        # 训练一代
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        # 测试并评估
        test_acc = evaluate_accuracy(net, test_iter)
        # 展示数据
        animator.add(epoch + 1, train_metrics + (test_acc,))
    # loss 和 准确率
    train_loss, train_acc = train_metrics
    assert train_loss < 0.5, train_loss
    assert train_acc <= 1 and train_acc > 0.7, train_acc
    assert test_acc <= 1 and test_acc > 0.7, test_acc


# 优化器
def updater(batch_size):
    return d2l.sgd([W, b], lr, batch_size)


def predict_ch3(net, test_iter, n=6):
    """预测标签（定义见第3章）。"""
    for X, y in test_iter:
        break
    trues = d2l.get_fashion_mnist_labels(y)
    preds = d2l.get_fashion_mnist_labels(net(X).argmax(axis=1))
    titles = [true + '\n' + pred for true, pred in zip(trues, preds)]
    d2l.show_images(X[0:n].reshape((n, 28, 28)), 1, n, titles=titles[0:n])
    d2l.plt.show()
if __name__=="__main__":
    batch_size = 128
    # 返回我们训练集和测试集的迭代器
    train_iter, test_iter = load_data_fashion_mnist(batch_size)
    num_epochs = 10
    # 训练
    train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, updater)
    # 预测一下
    predict_ch3(net, test_iter)

Output result:
insert image description here

insert image description here