We found it easier to implement linear regression through the high-level APIs of deep learning frameworks. Similarly, softmax regression models can be implemented more conveniently through the high-level API of deep learning frameworks.

Article directory

1.1 Image Classification Dataset
1.2 Initialize model parameters
1.3 Loss function
1.4 Optimization algorithm
1.5 Training

1.1 Image Classification Dataset

The MNIST dataset is one of the widely used datasets in image classification, but it is too simplistic as a benchmark dataset. We will use a similar but more complex Fashion-MNIST dataset

%matplotlib inline
import torch
import torchvision
from torch.utils import data
from torchvision import transforms
from d2l import torch as d2l

d2l.use_svg_display()

1.1.1 Read the dataset

We can download and read the Fashion-MNIST dataset into memory through built-in functions in the framework.

# 通过ToTensor实例将图像数据从PIL类型变换成32位浮点数格式，
# 并除以255使得所有像素的数值均在0到1之间
trans = transforms.ToTensor()
mnist_train = torchvision.datasets.FashionMNIST(
    root="../data", train=True, transform=trans, download=True)
mnist_test = torchvision.datasets.FashionMNIST(
    root="../data", train=False, transform=trans, download=True)

Fashion-MNIST consists of 10 categories of images, each of 训练数据集（train dataset）which consists of 6000 images 测试数据集（test dataset）in and 1000 images in . Therefore, the training and test sets contain 60,000 and 10,000 images, respectively. The test dataset will not be used for training, only to evaluate model performance.

len(mnist_train), len(mnist_test)

(60000, 10000)

The height and width of each input image are 28像素. The dataset consists of grayscale images with a channel number of 1. For brevity, denote the shape of an image with a height of h pixels and a width of w pixels as h×w or ( h , w ).

mnist_train[0][0].shape

torch.Size([1, 28, 28])

The 10 categories included in Fashion-MNIST are t-shirt (T-shirt), trouser (pants), pullover (pullover), dress (dress), coat (coat), sandal (sandals), shirt (shirt), sneaker (sneakers), bag (bag) and ankle boot (short boots). The following functions are used to convert between numeric label indices and their textual names.

def get_fashion_mnist_labels(labels):  
    """返回Fashion-MNIST数据集的文本标签"""
    text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
                   'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
    return [text_labels[int(i)] for i in labels]

Create a function to visualize these samples.

def show_images(imgs, num_rows, num_cols, titles=None, scale=1.5):  #@save
    """绘制图像列表"""
    figsize = (num_cols * scale, num_rows * scale)
    _, axes = d2l.plt.subplots(num_rows, num_cols, figsize=figsize)
    axes = axes.flatten()
    for i, (ax, img) in enumerate(zip(axes, imgs)):
        if torch.is_tensor(img):
            # 图片张量
            ax.imshow(img.numpy())
        else:
            # PIL图片
            ax.imshow(img)
        ax.axes.get_xaxis().set_visible(False)
        ax.axes.get_yaxis().set_visible(False)
        if titles:
            ax.set_title(titles[i])
    return axes

Below are the images of the first few samples in the training dataset and their corresponding labels.

X, y = next(iter(data.DataLoader(mnist_train, batch_size=18)))
show_images(X.reshape(18, 28, 28), 2, 9, titles=get_fashion_mnist_labels(y));

insert image description here

1.1.2 Reading Small Batches

To make it easier for us to read the training and test sets, instead of creating from scratch, we use the built-in data iterators. To recap, in each iteration, the data loader reads a small batch of data of batch_size each time. With built-in data iterators, we can shuffle all samples at random to read mini-batches without bias.

batch_size = 256

def get_dataloader_workers():  
    """使用4个进程来读取数据"""
    return 4

train_iter = data.DataLoader(mnist_train, batch_size, shuffle=True,
                             num_workers=get_dataloader_workers())

1.1.3 Integrate all components

Now we define load_data_fashion_mnist函数,Used to get and read the Fashion-MNIST dataset.This function returns data iterators for training and validation sets. Additionally, this function accepts an optional parameter resize, which is used to resize the image to another shape.

def load_data_fashion_mnist(batch_size, resize=None):  
    """下载Fashion-MNIST数据集，然后将其加载到内存中"""
    trans = [transforms.ToTensor()]
    if resize:
        trans.insert(0, transforms.Resize(resize))
    trans = transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(
        root="../data", train=True, transform=trans, download=True)
    mnist_test = torchvision.datasets.FashionMNIST(
        root="../data", train=False, transform=trans, download=True)
    return (data.DataLoader(mnist_train, batch_size, shuffle=True,
                            num_workers=get_dataloader_workers()),
            data.DataLoader(mnist_test, batch_size, shuffle=False,
                            num_workers=get_dataloader_workers()))

Below, we test the image resizing capabilities of the load_data_fashion_mnist function by specifying the resize parameter.

train_iter, test_iter = load_data_fashion_mnist(32, resize=64)
for X, y in train_iter:
    print(X.shape, X.dtype, y.shape, y.dtype)
    break

torch.Size([32, 1, 64, 64]) torch.float32 torch.Size([32]) torch.int64

1.2 Initialize model parameters

import torch
from torch import nn
from d2l import torch as d2l

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

The output layer of softmax regression is one 全连接层. Therefore, to implement our model, we simply add a fully connected layer with 10 outputs to Sequential. Again, Sequential is not necessary here, but it is the basis for implementing deep models. We still randomly initialize the weights with mean 0 and standard deviation 0.01.

# PyTorch不会隐式地调整输入的形状。因此，
# 我们在线性层前定义了展平层（flatten），来调整网络输入的形状
net = nn.Sequential(nn.Flatten(), nn.Linear(784, 10))

def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, std=0.01)

net.apply(init_weights);

1.3 Loss function

loss = nn.CrossEntropyLoss()

1.4 Optimization algorithm

Here, we use mini-batch stochastic gradient descent with a learning rate of 0.1 as the optimization algorithm. This is the same as we did in the linear regression example, which illustrates the generality of the optimizer.

trainer = torch.optim.SGD(net.parameters(), lr=0.1)

1.5 Training

num_epochs = 10
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)