Hands-on deep learning [2] - softmax regression

Hands-on deep learning website: hands-on deep learning
Note: This part only briefly introduces the basic knowledge and attaches complete code implementation. For more information, please refer to the above-mentioned website.

foreword

In the previous section we talked about linear regression, which solves the problem of predicting a certain value. But in daily life, in addition to predicting a certain value, we also pay attention to some classification problems, such as: whether the content of an image is a cat or a dog.

The value obtained by the softmax regression introduced in this section is a series of probability values, indicating the probability that the current content belongs to a certain category.

brief description

classification problem

Suppose there is a 2 * 2 grayscale image here, we can use a scalar to represent each pixel value, and each image corresponds to four features x1, x2, x3, x4. Also, assume that the image belongs to one of the categories "cat", "chicken", and "dog".

In the above examples, we use one-hot encoding to represent the corresponding categories. A one-hot encoding is a vector that has as many components as classes. The component corresponding to the category is set to 1, and all other components are set to 0 . In our example, the label y will be a 3D vector where (1,0,0) corresponds to "cat", (0,1,0) to "chicken", (0,0,1) to "dog". Taking cats as an example, (1,0,0) means that if the input image is a cat, then only the component corresponding to the cat is 1, and the others are 0. This is actually the representation of the real label. At this point y is expressed as:
insert image description here

Network Architecture

Softmax regression is also a fully connected layer:
fully connected layer features: each element of the output layer is related to each element of the input layer .
insert image description here
Described by the formula:
insert image description here
the vector form is:
insert image description here

softmax operation

Idea : We want the output of the network to represent the probability that the input data belongs to a certain class, and then select the type with the highest probability as the type of input data. For example, if the predicted y1, y2, and y3 are 0.1, 0.8, and 0.1 respectively, then the category we predict is category 2 because it has the highest probability.
Implementation :
insert image description here
The softmax function can transform the unnormalized predictions into non-negative numbers and sum to 1, while keeping the model derivable.
Why is normalization needed ? Because we want to see the probability of the output data, and these data may be negative, which does not conform to the nature of probability, and we need to limit the synthesis of the output data to 1 in order to judge the type according to the magnitude of the probability.
Finally, by finding the category with the highest probability is what we are looking for:
insert image description here

Mini-batch vectorization

As before, it is impossible for us to add all the data to the training at one time, which requires a lot of memory overhead, so we read a part at a time. The formula is:
insert image description here
where the feature dimension of X is d, the input batch size is n, and the output has q categories.

loss function

We use maximum likelihood estimation, as in the linear regression part.
The softmax function gives a vector y^, which we can think of as " the conditional probability of each class given an arbitrary input
".
The likelihood estimate is:
insert image description here
According to the maximum likelihood estimate, we need to maximize the above formula, that is, minimize the negative log likelihood: the
insert image description here
insert image description here
loss function here is the cross-entropy loss.
It should be noted that the real label vector y here is a one-hot encoding, that is to say, the value of the position that is the same as the input data type is 1, and the others are 0, so the final loss function becomes the negative of the predicted label . logarithm.

softmax and its derivatives

Expand the loss function, and the y^ in the log can be expanded with the softmax function, as shown below: The
insert image description here
partial derivative corresponding to the unregulated prediction oj is:
insert image description here

cross entropy loss

When it comes to entropy, we need to talk about information theory. Information theory deals with encoding, decoding, sending, and processing information or data as concisely as possible.
Definition of entropy: The definition of entropy in information theory is:
insert image description here
it is
the expectation of the amount of information when the assigned probability really matches the data generation process
.
Information amount:
insert image description here
The above formula can be understood as, the greater the probability of an event occurring, the smaller the amount of information it carries. When p=1, the amount of information is 0, and the entropy will be equal to 0, that is to say, the event The occurrence of will not lead to any increase in the amount of information.

Examining cross-entropy :
the expected surprise of an observer with subjective probability Q when seeing data generated with probability P.

the code

The specific task faced here is: use softmax regression to train the Fashion-MNIST dataset and verify the effect on its test set.
Import related packages:

import torch
import torchvision
from torch.utils import data
from torchvision import transforms
from d2l import torch as d2l
from IPython import display

1. Prepare the dataset

As with any task, the first step is definitely to prepare the dataset.
First of all, try a small knife, use a few statements to test, as follows:

# 读取数据集
## tensor转换器,除以255进行归一化,使像素值范围在0-1
trans = transforms.ToTensor()
## 训练集
mnist_train = torchvision.datasets.FashionMNIST(
    root="/kaggel/output/data", train=True, transform=trans, download=True)
## 测试集
mnist_test = torchvision.datasets.FashionMNIST(
    root="/kaggel/output/data", train=False, transform=trans, download=True)

This statement uses the built-in data loading function in Pytorch, which will download relevant data sets from the Internet and process these data, such as the trans processing above, to turn the data into tensor and normalize it.
Then define a text label corresponding to the returned number label, which is needed to get the predicted label later.

# 返回数字标签对应的文本标签
def get_fashion_mnist_labels(labels):
    # 文本标签
    text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
                   'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
    # 返回数字标签对应的文本标签
    return [text_labels[int(i)] for i in labels]

Then if you want to visualize the data, you can do it by the following method:

# 可视化样本
def show_images(imgs, num_rows, num_cols, titles=None, scale=1.5):
    # 图的大小
    figsize = (num_cols * scale, num_rows * scale)
    # 表示切割成num_rows行*num_cols列的子图像
    _, axes = d2l.plt.subplots(num_rows, num_cols, figsize=figsize)
    axes = axes.flatten()
    for i, (ax, img) in enumerate(zip(axes, imgs)):
        if torch.is_tensor(img):
            # 图片张量
            ax.imshow(img.numpy())
        else:
            # PIL图片
            ax.imshow(img)
        # 设置坐标轴是否可见
        ax.axes.get_xaxis().set_visible(False)
        ax.axes.get_yaxis().set_visible(False)
        if titles:
            # 设置标题
            ax.set_title(titles[i])
    return axes

# X是返回的图像,y是这些图像对应的标签
X, y = next(iter(data.DataLoader(mnist_train, batch_size=18)))
# 因为X大小为(18,1,28,28),需要将它转换
show_images(X.reshape(18, 28, 28), 2, 9, titles=get_fashion_mnist_labels(y));

After reading the data, we need to load the data, here we use the following statement:

def get_dataloader_workers():  
    """使用4个进程来读取数据"""
    return 4
train_iter = data.DataLoader(mnist_train, batch_size, shuffle=True,
                             num_workers=get_dataloader_workers())

The function to finally read the dataset is:

def load_data_fashion_mnist(batch_size, resize=None):  
    """下载Fashion-MNIST数据集,然后将其加载到内存中"""
    trans = [transforms.ToTensor()]
    if resize:
        trans.insert(0, transforms.Resize(resize))
    trans = transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(
        root="/kaggel/output/data", train=True, transform=trans, download=True)
    mnist_test = torchvision.datasets.FashionMNIST(
        root="/kaggel/output/data", train=False, transform=trans, download=True)
    return (data.DataLoader(mnist_train, batch_size, shuffle=True,
                            num_workers=get_dataloader_workers()),
            data.DataLoader(mnist_test, batch_size, shuffle=False,
                            num_workers=get_dataloader_workers()))

The function of this function is to obtain the training set and test set, and the content is a combination of the above-discussed parts. The first is to obtain the data set, and then use data.DataLoader to load the data into the training set and test set respectively.
We can also use a method that comes with d2l to load data, the code is as follows:

import torch
from IPython import display
from d2l import torch as d2l
batch_size = 256
# 使用d2l中的部分
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

2. Define the softmax function

Calculate according to the previous formula, but you need to pay attention to the summation of the rows, the axis 0 represents the column, and the axis 1 represents the row.

def softmax(X):
    X_exp = torch.exp(X)
    # 这里对每行进行求和,因为每一行是一个样本
    partition = X_exp.sum(1, keepdim=True)
    return X_exp / partition  # 这里应用了广播机制

3. Define the loss function and network structure

Loss function : According to the above mentioned, we are using the cross-entropy loss, and we need to pay attention: the real label y is a one-hot encoding, except for a corresponding position which is 1, it is actually 0, so the cross-entropy loss becomes The negative logarithm of the predicted probability at that position.
Network structure : According to what was mentioned before, the network structure is a fully connected layer, which is represented by the formula: wx+b.
So the code is:

def cross_entropy(y_hat, y):
    return - torch.log(y_hat[range(len(y_hat)), y])
def net(X):
    return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W) + b)

4. Calculation of accuracy

Accuracy is the ratio of the number of correct predictions to the total number of predictions. So we first need to calculate the number of correct predictions. It should be noted here that the predicted value and the actual value have the same numerical type, because the == symbol is used to determine which parts are predicted correctly and which are incorrect.

"""计算预测正确的数量"""
def accuracy(y_hat, y): 
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
        y_hat = y_hat.argmax(axis=1)
    cmp = y_hat.type(y.dtype) == y
    return float(cmp.type(y.dtype).sum())

Then calculate the precision, the code is as follows:

def evaluate_accuracy(net, data_iter): 
    """计算在指定数据集上模型的精度"""
    if isinstance(net, torch.nn.Module):
        net.eval()  # 将模型设置为评估模式
    metric = Accumulator(2)  # 正确预测数、预测总数
    with torch.no_grad():
        for X, y in data_iter:
            # y.numel()返回y中元素的个数 
            metric.add(accuracy(net(X), y), y.numel())
    return metric[0] / metric[1]

Here we use a class to store the number of correct predictions and the total number. The definition of this class is:

"""在n个变量上累加"""
class Accumulator: 
    def __init__(self, n):
        self.data = [0.0] * n

    def add(self, *args):
        self.data = [a + float(b) for a, b in zip(self.data, args)]

    def reset(self):
        self.data = [0.0] * len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]

5. Prepare for training

First of all, let's consider the training of a single epoch. The overall process is as follows:

  • Loop out the input data X and the label y of the data
  • Input the input data X into the neural network to get the predicted label y^
  • Calculate the loss for the true and predicted labels
  • Backpropagation according to the loss

At the same time we also return the training accuracy and training loss.
code show as below:

# 训练代码
def train_epoch_ch3(net, train_iter, loss, updater): 
    # 将模型设置为训练模式
    if isinstance(net, torch.nn.Module):
        net.train()
    # 训练损失总和、训练准确度总和、样本数
    metric = Accumulator(3)
    for X, y in train_iter:
        # 计算梯度并更新参数
        y_hat = net(X)
        # print(y_hat)
        l = loss(y_hat, y)
        if isinstance(updater, torch.optim.Optimizer):
            # 使用PyTorch内置的优化器和损失函数
            updater.zero_grad()
            l.mean().backward()
            updater.step()
        else:
            # 使用定制的优化器和损失函数
            l.sum().backward()
            updater(X.shape[0])
        metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
    # 返回训练损失和训练精度
    return metric[0] / metric[2], metric[1] / metric[2]

It should be noted that the parameter updater here is divided into two cases, one case uses l.mean, and the other uses l.sum(), which is caused by the different frameworks used. The former uses the Pytorch framework, while the latter uses d2l.

After a single epoch is completed, you can transition to this process smoothly. In fact, it is enough to traverse all epochs. The code is as follows:

# 训练整个模型
def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):  
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])
    for epoch in range(num_epochs):
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        print(train_metrics)
        test_acc = evaluate_accuracy(net, test_iter)
        animator.add(epoch + 1, train_metrics + (test_acc,))
    train_loss, train_acc = train_metrics
#     print(train_loss)
    assert train_loss < 0.5, train_loss
    assert train_acc <= 1 and train_acc > 0.7, train_acc
    assert test_acc <= 1 and test_acc > 0.7, test_acc

Here, for the convenience of visualization, another class is added to update the displayed image. The implementation is as follows:

# 动画函数
"""在动画中绘制数据"""
class Animator: 
    def __init__(self, xlabel=None, ylabel=None, legend=None, xlim=None,
                 ylim=None, xscale='linear', yscale='linear',
                 fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1,
                 figsize=(3.5, 2.5)):
        # 增量地绘制多条线
        if legend is None:
            legend = []
        d2l.use_svg_display()
        self.fig, self.axes = d2l.plt.subplots(nrows, ncols, figsize=figsize)
        if nrows * ncols == 1:
            self.axes = [self.axes, ]
        # 使用lambda函数捕获参数
        self.config_axes = lambda: d2l.set_axes(
            self.axes[0], xlabel, ylabel, xlim, ylim, xscale, yscale, legend)
        self.X, self.Y, self.fmts = None, None, fmts

    def add(self, x, y):
        # 向图表中添加多个数据点
        if not hasattr(y, "__len__"):
            y = [y]
        n = len(y)
        if not hasattr(x, "__len__"):
            x = [x] * n
        if not self.X:
            self.X = [[] for _ in range(n)]
        if not self.Y:
            self.Y = [[] for _ in range(n)]
        for i, (a, b) in enumerate(zip(x, y)):
            if a is not None and b is not None:
                self.X[i].append(a)
                self.Y[i].append(b)
        self.axes[0].cla()
        for x, y, fmt in zip(self.X, self.Y, self.fmts):
            self.axes[0].plot(x, y, fmt)
        self.config_axes()
        display.display(self.fig)
        display.clear_output(wait=True)

After the above is done, we can train the model, the code is as follows:

# 参数初始化
# 因为原图像大小为28 * 28,我们将其平铺,也就是784
num_inputs = 784
# 因为总共10类,所以输出数目为10
num_outputs = 10

W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True)
b = torch.zeros(num_outputs, requires_grad=True)
lr = 0.1
def updater(batch_size):
    with torch.no_grad():
        return d2l.sgd([W, b], lr, batch_size)
num_epochs = 10
train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, updater)

The results obtained are as follows:
insert image description here
If the train_loss is nan during the training process, you can try to run the code program again.

6. Test

A good model must not only achieve good results on the training set, but also have excellent performance on the test set. For this, we test the trained model.

# 预测
def predict_ch3(net, test_iter, n=6):  #@save
    for X, y in test_iter:
        break
    trues = d2l.get_fashion_mnist_labels(y)
    preds = d2l.get_fashion_mnist_labels(net(X).argmax(axis=1))
    titles = [true +'\n' + pred for true, pred in zip(trues, preds)]
    d2l.show_images(
        X[0:n].reshape((n, 28, 28)), 1, n, titles=titles[0:n])
predict_ch3(net, test_iter,n=10)

The result is:
insert image description here
It can be seen that the prediction results are all correct.

Guess you like

Origin blog.csdn.net/qq_41234663/article/details/129431370