Article directory

foreword
1. Training set and test set
Two, steps
Summarize

foreword

Softmax regression, also known as multinomial or multi-category Logistic regression, is the promotion of Logistic regression on multi-classification problems.

1. Training set and test set

Use the dataset Fashion-MNIST obtained in the previous section.

Two, steps

1. Import library

import torch
import torchvision
import numpy as np
import sys
sys.path.append("..") # 为了导入上层目录的d2lzh_pytorch
from d2lzh_pytorch import *
import d2lzh_pytorch as d2l

2. Read data

batch_size =256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

d2l.load_data_fashion_mnist(batch_size)

This function is equivalent to integrating the tasks done in the previous lecture into one function.
This function has been saved in the d2lzh package
def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST')
    trans = []
    if resize:
        trans.append(torchvision.transforms.Resize(size=resize))	#因为resize=None所以不会执行这一步
    trans.append(torchvision.transforms.ToTensor())	#转化为tensor形式
    
    transform = torchvision.transforms.Compose(trans)	#读取图像
    mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)	#上一讲有说，下载训练集数据
    mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)	#上一讲有说，下载测试集数据
    
    if sys.platform.startswith('win'):
        num_workers = 0  # 0表示不用额外的进程来加速读取数据
    else:
        num_workers = 4	#设置4个进程读取数据
    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=num_workers)
    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=num_workers)

    return train_iter, test_iter  
The following is an explanation of the second if statement:
We will train the model on the training data set, and evaluate the performance of the trained model on the test data set. As mentioned earlier, mnist_trainit is torch.utils.data.Dataseta subclass, so we can pass it in torch.utils.data.DataLoaderto create a DataLoader instance that reads small batches of data samples.
In practice, data reading is often the performance bottleneck of training, especially when the model is relatively simple or the computing hardware performance is high. A handy feature in PyTorch's DataLoader is the ability to use multiprocessing to speed up data reading. Here we use the parameter num_workers to set 4 processes to read data.

3. Initialize model parameters

num_inputs =784
num_outputs = 10

W = torch.tensor(np.random.normal(0, 0.01, (num_inputs,num_outputs)),
                 dtype=torch.float)
b = torch.zeros(num_outputs, dtype=torch.float)

W.requires_grad_(requires_grad=True)
b.requires_grad_(requires_grad=True)

4. Define the model

def softmax(X):
    X_exp = X.exp()
    partition = X_exp.sum(dim=1, keepdim=True)
    return X_exp / partition

def net(X):
    return softmax(torch.mm(X.view((-1, num_inputs)),W) + b)

X.exp()

Returns e raised to the X power

X_exp.sum(dim=1, keepdim=True)

torch.sum()To sum a certain dimension of the input tensor data, there are two usages
to sum the elements of the same column (dim=0) or the same row (dim=1), and keep the two dimensions of row and column in the result (keepdim=True).

.mm(）Matrix multiplication .view()redefines the shape of a matrix

5. Define the loss function

y_hat = torch.tensor([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])
y = torch.LongTensor([0, 2])
y_hat.gather(1, y.view(-1, 1))

def cross_entropy(y_hat,y):
    return - torch.log(y_hat.gather(1, y.view(-1,1)))

.LongTensor()

Convert to longtensor type

torch.gather(input, dim, index, out=None) → Tensor

The size of the returned tensor is the same as the size of the index.
dim is used to indicate the dimension represented by the element value of index. This function can be used to easily extract the element at the specified position. (Extracting the square matrix, not extracting a single element)

6. Calculate the classification accuracy

def accuracy(y_hat,y):
    return (y_hat.argmax(dim=1) == y).float().mean().item()

print(accuracy(y_hat, y))

.argmax(dim=1)

Returns the index of the largest number

.item()

python import torch x = torch.randn(2, 2) 
print(x) 
print(x[1,1])
print(x[1,1].item())

tensor([[ 0.4702,  0.5145],
        [-0.0682, -1.4450]]) 
tensor(-1.4450)
-1.445029854774475 
It can be seen that there is a difference in display accuracy. item() returns a floating-point data, so when we are looking for loss or accuracy, we generally use item() instead of directly taking its corresponding element x[1,1] .

# 本函数已保存在d2lzh_pytorch包中方便以后使用。该函数将被逐步改进：它的完整实现将在“图像增广”一节中描述
def evaluate_accuracy(data_iter, net):
    acc_sum, n = 0.0, 0
    for X, y in data_iter:
        acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
        n += y.shape[0]
    return acc_sum / n

7. Training model

num_epochs, lr = 4, 0.1
train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W, b], lr)

train_ch3()Saved in the d2lzh package

def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params=None, lr=None, optimizer=None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        for X, y in train_iter:
            y_hat = net(X)
            l = loss(y_hat, y).sum()
            
            # 梯度清零
            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()
                    
            l.backward()
            if optimizer is None:
                sgd(params, lr, batch_size)
            else:
                optimizer.step()  # “softmax回归的简洁实现”一节将用到
                
            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))

train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W, b], lr)

net model
train_iter training data
test_iter test data
loss loss value
num_epochs number of training cycles
batch_size batch size
params=[W, b] model parameter
lr step size
optimizer=None

8. Forecast

X ,y = iter(test_iter).next()

true_labels = d2l.get_fashion_mnist_labels(y.numpy())
pred_labels = d2l.get_fashion_mnist_labels(net(X).argmax(dim=1).numpy())
titles = [true +'\n' + pred for true, pred in zip(true_labels, pred_labels)]

d2l.show_fashion_mnist(X[0:9], titles[0:9])

Summarize

"Hands-on deep learning + PyTorch" 3.6 softmax regression from scratch to achieve study notes