[youcans hands-on model] Wide ResNet model

Welcome to the "youcans hands-on model" series
The content and resources of this column are synchronized to GitHub/youcans

[youcans hands-on model] Wide ResNet model

This article uses PyTorch to implement the WideResNet network model, and uses the CIFAR10 dataset to train the model for image classification.

1. Wide ResNet convolutional neural network model

Sergey Zagoruyko, Nikos Komodakis, etc. published the paper "Wide Residual Networks" in 2016, proposing a wide residual network (wide residual networks, WRN) architecture, which improves the residual network by increasing the width of the residual network and reducing the depth of the network. accuracy and efficiency.

[Paper download address]: [Wide Residual Networks] https://arxiv.org/abs/1605.07146

[GitHub address]: Refer to example 1 , refer to example 2

1.1 Model Introduction

Wide ResNet is a wide residual network ( wide residual networks , WRN ) architecture, which improves the accuracy and efficiency of the residual network by increasing the width of the residual network and reducing the network depth.

insert image description here

1.2 Introduction to the paper

【Abstract】

Deep residual networks have been shown to be able to scale to thousands of layers and still improve the performance of the network. However, the cost of each 1% increase in accuracy is roughly 2 times the number of layers. Therefore, training very deep residual networks suffers from the inefficiency of reusing features, which makes network training very slow.

We conduct a detailed experimental study on the ResNet architecture and find that the power of the residual network is mainly provided by the residual block, and the network depth has only a supplementary role.

We propose a Wide Residual Network (WRN) architecture that reduces the depth and increases the width of residual networks. This network structure is far superior to the commonly used very deep and narrow network structure. Even a simple 16-layer wide residual network outperforms all previous deep residual networks in accuracy and efficiency, including those with a thousand-layer depth. Achieves state-of-the-art results on CIFAR, SVHN, COCO datasets and achieves significant improvements on ImageNet.

【Background】

From AlexNet, VGG, Inception to residual network, as the number of neural network layers continues to increase, the performance continues to improve. However, training deeper and deeper neural networks is very difficult due to exploding/vanishing gradients and degradation. Recently, the residual network ResNet has achieved great success, and residual links accelerate the convergence of deep networks. The earlier Highway network can also successfully train very deep networks.

Residual block proposes the concept of skip connection, which directly adds the input of the layer to the output to form a residual path (shortcut), so that the network can directly learn the changes of the residual part. Its mathematical description is as follows:

$\begin{matrix} Conv Layer: &x_l = &F(x_{l-1}, W_{l-1})\\ Res Block: &x_l = &F(x_{l-1}, W_{l-1}) &+ &x_{l-1} \end{matrix}$

Residual block (Residual block) has two structural forms.

(1) Basic residual block: As shown in figure (a) basic, it has two consecutive 3*3 convolutions with BN batch normalization and ReLU activation function.

(2) Bottleneck residual block: As shown in figure (b) bottleneck, a 3*3 convolutional layer, with a 1*1 convolutional layer before and after, respectively for dimension reduction and expansion. Specifically, a 1*1 convolutional layer is used to reduce the dimensionality of the input feature map, then a 3*3 convolution is performed, and finally a 1*1 convolutional layer is used to increase the dimension. The 3*3 convolutional layer is narrow (thin), forming a bottleneck (bottleneck), which can reduce the amount of calculation of the residual block.

So far, research on residual networks has mainly focused on the order of activations within the ResNet block and the depth of the residual network. Compared with the original architecture ResNet, the order of batch normalization, activation and convolution operations in the improved residual block is changed from conv-BN-ReLU to BN-ReLU-conv. The latter has been shown to train faster and perform better.

Our goal is to explore a richer set of network architectures for ResNet blocks and investigate how several different aspects besides activation order affect performance.

Width and Depth in Residual Networks

Circuit complexity theory suggests that shallow circuits may require exponentially more components than deep circuits. The authors of the residual network tried to make it as thin (thin) as possible to facilitate increasing depth and reducing parameters, and even introduced a "bottleneck" block to make the ResNet block even thinner.

However, while residual blocks allow for training very deep networks, as the gradients flow through the network, there is no guarantee that passing through the residual block weights, there may be only a few blocks learning useful representations, or many blocks sharing little information, which is critical to the final The contribution of the target is small. This problem is known as the inefficiency of feature reuse.

We try to answer the question of how wide a deep residual network should be. Experiments show that the performance of deep residual networks mainly comes from residual blocks, and the effect of depth is complementary.

Our research shows that enlarging the ResNet block can improve the performance of the residual network more effectively than increasing the depth of the residual network. We propose a wider Wide Residual Network (Wide ResNet), which has 50 times fewer layers and is more than 2 times faster.

Using Dropout in ResNet block

Dropout is mainly applied to top layers with a large number of parameters to prevent overfitting, and is often replaced by batch normalization (BN) later. Batch normalization can also be used as a regularizer, and experiments show that the network with BN has better accuracy than the network with dropout.

Since the widening of the residual block leads to an increase in the number of parameters, we study the effect of dropout on regularizing training and preventing overfitting, and believe that dropout should be inserted between convolutional layers. Experimental results on wide residual networks show that Dropout is effective. For example, a 16-layer wide residual network with Dropout achieves an error of 1.64% on SVHN.

【Main innovation】

There are three simple ways to increase the feature expressive power of residual blocks:

Increase the number of layers: add more convolutional layers for each residual block;
Increase width: add more feature maps to widen the convolutional layer;
Increase Convolution Kernel: Increase the size of the convolution kernel in the convolution layer.

Since studies such as VGG prove the effectiveness of small convolution kernels, we do not consider using convolution kernels larger than 3*3.

We introduce two parameters, the depth factor $l$ and width factor $k$ , where $l$ is the number of convolutional layers in the residual block, $k$ is the expansion multiple of the input feature, so the residual block shown in figure (a) basic corresponds to $l = 2, k = 1$ , the residual block shown in Figure © basic-wide corresponds to $l = 2, k > 1$ 。

Increase the width of the convolutional layer (increasing $k$ ), the number of parameters and computational complexity increase. However, due to the high parallel computing efficiency of GPUs on large tensors, it is more efficient to use widening layers. All previous architectures of residual networks, such as VGG and Inception, used very wide convolutional layers. We therefore wish to find the optimal ratio of the number of residual blocks d to the width factor k by experiment.

Additionally, we regularize via dropout. The residual network is regularized by batch normalization BN, and we add a dropout layer to the residual block to prevent overfitting, as shown in (d) wide-dropout.

【Model structure】

The general structure of our residual network is shown in Table 1: first an initial convolutional layer conv1, followed by 3 sets (each of size N) of residual blocks conv2, conv3 and conv4, and finally an average pooling layer and classification layer.

insert image description here

In all our experiments, the size of conv1 is fixed, while the introduced width factor $k$ scales the width of the residual blocks in the three groups conv2~conv4.

Take $B (M)$ denotes the residual block structure, where M is the list of convolution kernels in the residual block. For example, B(3,1) represents a 3*3 convolutional layer and a 1*1 convolutional layer, B(1,3,1) represents a 1*1 convolutional layer as shown in (b) bottleneck, 3*3 convolutional layer and 1*1 convolutional layer.

【performance】

On Cifar10 and Cifar100, the comparison on the left without dropout works the best. The picture on the right is a comparison with dropout, the effect is best when the depth is 28 and the width is 10
The useless bottleneck design on the left picture is only used for large data sets and deep networks, so there is no need for dropout, such as WRN-50-2-bottleneck. On the right it achieves the best results on CIFAR-10, CIFAR-100, SVHN and COCO.

2. Define the WideResNet model class in PyTorch

2.1 Custom WideResNet model class

PyTorch provides a high-level API through the torch.nn module to build networks from scratch.

# https://github.com/xternalz/WideResNet-pytorch/blob/master/wideresnet.py
class BasicBlock(nn.Module):
    def __init__(self, in_planes, out_planes, stride, dropRate=0.0):
        super(BasicBlock, self).__init__()
        self.bn1 = nn.BatchNorm2d(in_planes)
        self.relu1 = nn.ReLU(inplace=True)
        self.conv1 = nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_planes)
        self.relu2 = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_planes, out_planes, kernel_size=3, stride=1,
                               padding=1, bias=False)
        self.droprate = dropRate
        self.equalInOut = (in_planes == out_planes)
        self.convShortcut = (not self.equalInOut) and nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride,
                               padding=0, bias=False) or None
    def forward(self, x):
        if not self.equalInOut:
            x = self.relu1(self.bn1(x))
        else:
            out = self.relu1(self.bn1(x))
        out = self.relu2(self.bn2(self.conv1(out if self.equalInOut else x)))
        if self.droprate > 0:
            out = F.dropout(out, p=self.droprate, training=self.training)
        out = self.conv2(out)
        return torch.add(x if self.equalInOut else self.convShortcut(x), out)

class NetworkBlock(nn.Module):
    def __init__(self, nb_layers, in_planes, out_planes, block, stride, dropRate=0.0):
        super(NetworkBlock, self).__init__()
        self.layer = self._make_layer(block, in_planes, out_planes, nb_layers, stride, dropRate)
    def _make_layer(self, block, in_planes, out_planes, nb_layers, stride, dropRate):
        layers = []
        for i in range(int(nb_layers)):
            layers.append(block(i == 0 and in_planes or out_planes, out_planes, i == 0 and stride or 1, dropRate))
        return nn.Sequential(*layers)
    def forward(self, x):
        return self.layer(x)

class WideResNet1(nn.Module):
    def __init__(self, depth, num_classes, widen_factor=1, dropRate=0.0):
        super(WideResNet1, self).__init__()
        nChannels = [16, 16*widen_factor, 32*widen_factor, 64*widen_factor]
        assert((depth - 4) % 6 == 0)
        n = (depth - 4) / 6
        block = BasicBlock
        # 1st conv before any network block
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=nChannels[0], kernel_size=3,
                               stride=1, padding=1, bias=False)
        # 1st block
        self.conv2 = NetworkBlock(n, nChannels[0], nChannels[1], block, 1, dropRate)
        # 2nd block
        self.conv3 = NetworkBlock(n, nChannels[1], nChannels[2], block, 2, dropRate)
        # 3rd block
        self.conv4 = NetworkBlock(n, nChannels[2], nChannels[3], block, 2, dropRate)
        # global average pooling and classifier
        self.bn = nn.BatchNorm2d(nChannels[3])
        self.relu = nn.ReLU(inplace=True)
        self.fc = nn.Linear(nChannels[3], num_classes)
        self.nChannels = nChannels[3]

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                m.bias.data.zero_()

    def forward(self, x):
        out = self.conv1(x)
        out = self.conv2(out)
        out = self.conv3(out)
        out = self.conv4(out)
        out = self.relu(self.bn(out))
        out = F.avg_pool2d(out, 8)
        out = out.view(-1, self.nChannels)
        return self.fc(out)

2.2 Custom WideResNet model class 2

PyTorch provides a high-level API through the torch.nn module to build networks from scratch.

# https://github.com/meliketoy/wide-resnet.pytorch/blob/master/networks/wide_resnet.py
class wide_basic(nn.Module):
    def __init__(self, ch_in, ch_out, dropout_rate, stride=1):
        super(wide_basic, self).__init__()
        self.bn1 = nn.BatchNorm2d(ch_in)
        self.conv1 = nn.Conv2d(ch_in, ch_out, kernel_size=3, padding=1, bias=True)
        self.dropout = nn.Dropout(p=dropout_rate)
        self.bn2 = nn.BatchNorm2d(ch_out)
        self.conv2 = nn.Conv2d(ch_out, ch_out, kernel_size=3, stride=stride, padding=1, bias=True)

        self.shortcut = nn.Sequential()
        if stride != 1 or ch_in != ch_out:
            self.shortcut = nn.Sequential(
                nn.Conv2d(ch_in, ch_out, kernel_size=1, stride=stride, bias=True),
            )

    def forward(self, x):
        out = self.dropout(self.conv1(F.relu(self.bn1(x))))
        out = self.conv2(F.relu(self.bn2(out)))
        out += self.shortcut(x)
        return out

class WideResNet2(nn.Module):
    def __init__(self, depth, num_classes, widen_factor=1, dropRate=0.0):
        super(WideResNet2, self).__init__()
        self.ch_in = 16

        assert ((depth-4)%6 ==0), 'Wide-resnet depth should be 6n+4'
        n = (depth-4)/6
        k = widen_factor
        nStages = [16, 16*k, 32*k, 64*k]
        print('| Wide-Resnet %dx%d' %(depth, k))

        # 1st conv before any network block
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=nStages[0], kernel_size=3,
                               stride=1, padding=1, bias=False)
        # Wide Res Block
        self.conv2 = self._wide_layer(wide_basic, nStages[1], n, dropRate, stride=1)
        self.conv3 = self._wide_layer(wide_basic, nStages[2], n, dropRate, stride=2)
        self.conv4 = self._wide_layer(wide_basic, nStages[3], n, dropRate, stride=2)
        # global average pooling and classifier
        self.bn = nn.BatchNorm2d(nStages[3])
        self.relu = nn.ReLU(inplace=True)
        self.fc = nn.Linear(nStages[3], num_classes)

    def _wide_layer(self, block, planes, num_blocks, dropRate, stride):
        strides = [stride] + [1]*(int(num_blocks)-1)
        layers = []
        for stride in strides:
            layers.append(block(self.ch_in, planes, dropRate, stride))
            self.ch_in = planes
        return nn.Sequential(*layers)

    def forward(self, x):
        out = self.conv1(x)
        out = self.conv2(out)
        out = self.conv3(out)
        out = self.conv4(out)
        out = self.relu(self.bn(out))
        out = F.avg_pool2d(out, 8)
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        return out

2.3 Load a predefined model from torchvision.model

Torchvision comes with many classic network models, which can directly load these predefined models. We can only use predefined model classes to create instantiated model objects (without loading pre-trained model parameters) for model training, or we can load pre-trained model parameters while instantiating model objects, or based on pre-trained The model undergoes model fine-tuning or transfer learning.

The Wide ResNet model is available both in the torchvision.models package and in Torch Hub. torchvision.models provides Wide ResNetmodel class and pre-training model Wide ResNet model | PyTorch can be used directly, the original code can refer to: source code . The Wide ResNet model is available in Torch Hub | PyTorch Hub .

The following model builders can be used to instantiate a Wide ResNet model with or without pretrained weights. All model builders internally depend on the torchvision.models.resnet.resnet base class.

wide_resnet50_2(*[, weights, progress])  # Wide ResNet-50-2 model from Wide Residual Networks.

wide_resnet101_2(*[, weights, progress])  # Wide ResNet-101-2 model from Wide Residual Networks.

Torch Hub provides the Wide ResNet model class, and also provides a pre-trained model trained on the ImageNet dataset, which can be directly used for image classification or migration learning.

# torch.hub 方式加载 load WRN-50-2
model = torch.hub.load('pytorch/vision:v0.10.0', 'wide_resnet50_2', pretrained=True)

# torch.hub 方式加载 load WRN-101-2
model = torch.hub.load('pytorch/vision:v0.10.0', 'wide_resnet101_2', pretrained=True)

3. CIFAR10 image classification based on WideResNet model

3.1 The basic steps of building a neural network model in PyTorch

The basic steps to build, train and use a neural network model with PyTorch are as follows.

Prepare dataset: Load the dataset and preprocess the data.
Design the model: instantiate the model class, define the loss function and optimizer, and determine the model structure and training method.
Model training: use the training data set to train the model and determine the model parameters.
Model inferring: Use the trained model to perform inference and predict the output results for the input data.
Model saving/loading: Save the trained model for later use or deployment.

The following steps explain the routine of the AlexNet model.

3.2 Loading the CIFAR10 dataset

The sample structure of the general data set is balanced, the information is efficient, and the organization is standardized and easy to handle. Using a common dataset to train a neural network can not only improve work efficiency, but also facilitate evaluation of model performance.

PyTorch provides some commonly used image datasets, preloaded in torchvision.datasetsthe class. torchvisionThe module implements the core classes and methods required for neural networks, torchvision.datasetsincluding popular datasets, model architectures, and commonly used image conversion methods.

The CIFAR dataset is a classic small dataset for image classification, with two versions, CIFAR10 and CIFAR100. CIFAR10 has 10 classes and CIFAR100 has 100 classes. CIFAR10 Each image size is 32*32, including 10 categories of airplane, car, bird, cat, deer, dog, frog, horse, boat, and truck. CIFAR10 has a total of 60,000 images, including 50,000 in the training set and 10,000 in the test set. Each category has 6000 images and the dataset is balanced.

The method to load and use the CIFAR dataset is:

torchvision.datasets.CIFAR10()
torchvision.datasets.CIFAR100()

The CIFAR data set can be downloaded from the official website: http://www.cs.toronto.edu/~kriz/cifar.html and then used, or it can be automatically loaded using the datasets class (if the local path does not have the file, it will be automatically downloaded).

When downloading a dataset, use the predefined transform method for data preprocessing, including resizing images, normalizing them, and converting data formats into tensors. The mean and variance of the CIFAR10 dataset used for standardization are (0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616). During the training process, transform_train increases randomness and improves generalization ability.

Large training data sets cannot load all samples for training at one time, and the Dataloader class can be used to automatically load data. Dataloader is an iterator whose basic function is to pass in a Dataset object and generate a batch of data according to the parameter batch_size.

The routine for loading the CIFAR-10 dataset using the DataLoader class is as follows.

    # (1) 将[0,1]的PILImage 转换为[-1,1]的Tensor
    transform_train = transforms.Compose([
        transforms.RandomHorizontalFlip(),  # 随机水平翻转
        transforms.RandomRotation(10),  # 随机旋转
        transforms.RandomAffine(0, shear=10, scale=(0.8, 1.2)),
        transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
        transforms.Resize(32),  # 图像大小调整为 (w,h)=(32，32)
        transforms.ToTensor(),  # 将图像转换为张量 Tensor
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616))])
    # 测试集不需要进行数据增强
    transform = transforms.Compose([
        transforms.Resize(32),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616))])

    # (2) 加载 CIFAR10 数据集
    batchsize = 128
    # 加载 CIFAR10 数据集, 如果 root 路径加载失败, 则自动在线下载
    # 加载 CIFAR10 训练数据集, 50000张训练图片
    train_set = torchvision.datasets.CIFAR10(root='../dataset', train=True,
                                            download=True, transform=transform_train)
    # train_loader = torch.utils.data.DataLoader(train_set, batch_size=batchsize)
    train_loader = torch.utils.data.DataLoader(train_set, batch_size=batchsize,
                                              shuffle=True, num_workers=8)
    # 加载 CIFAR10 验证数据集, 10000张验证图片
    test_set = torchvision.datasets.CIFAR10(root='../dataset', train=False,
                                           download=True, transform=transform)
    test_loader = torch.utils.data.DataLoader(test_set, batch_size=1000,
                                              shuffle=True, num_workers=8)
    # 创建生成器，用 next 获取一个批次的数据
    valid_data_iter = iter(test_loader)  # _SingleProcessDataLoaderIter 对象
    valid_images, valid_labels = next(valid_data_iter)  # images: [batch,3,32,32], labels: [batch]
    valid_size = valid_labels.size(0)  # 验证数据集大小，batch
    print(valid_images.shape, valid_labels.shape)

    # 定义类别名称，CIFAR10 数据集的 10个类别
    classes = ('plane', 'car', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck')

3.3 Building a WideResNet network model

Establish a WideResNet network model for training, including three steps:

Instantiate the WideResNet model object;
Set the loss function for training;
Set the optimizer for training.

The torch.nn.functional module provides various built-in loss functions, this example uses the cross entropy loss function CrossEntropyLoss.

The torch.optim module provides various optimization methods, this example uses the Adam optimizer. Note that the parameter model.parameters() of the model should be passed to the optimizer object so that the optimizer can scan the parameters that need to be optimized.

    # (3) 构造 WideResNet 网络模型
    model = WideResNet1(depth=28, num_classes=10, widen_factor=2, dropRate=0.1)  # 实例化 WideResNet 网络模型
    model.to(device)  # 将网络分配到指定的device中
    print(model)

    # 定义损失函数和优化器
    criterion = nn.CrossEntropyLoss()  # 定义损失函数 CrossEntropy
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)  # 定义优化器 SGD

3.4 WideResNet model training

The basic steps of PyTorch model training are:

The output value of the feedforward calculation model;
Calculate the loss function value;
Calculate the gradient of weight weight and bias bias;
Adjust the model parameters according to the gradient value;
Reset gradients to 0 (for next loop).

During the model training process, the validation set data can be used to evaluate the model accuracy during the training process in order to control the training process. Model verification is to use the verification data for model reasoning. The model output is obtained through forward calculation, but the model error is not reversely calculated. Therefore, torch.no_grad() needs to be set.

The routine for model training using PyTorch is as follows.

    # (4) 训练 WideResNet 模型
    epoch_list = []  # 记录训练轮次
    loss_list = []  # 记录训练集的损失值
    accu_list = []  # 记录验证集的准确率
    num_epochs = 100  # 训练轮次
    for epoch in range(num_epochs):  # 训练轮次 epoch
        running_loss = 0.0  # 每个轮次的累加损失值清零
        for step, data in enumerate(train_loader, start=0):  # 迭代器加载数据
            optimizer.zero_grad()  # 损失梯度清零

            inputs, labels = data  # inputs: [batch,3,32,32] labels: [batch]
            outputs = model(inputs.to(device))  # 正向传播
            loss = criterion(outputs, labels.to(device))  # 计算损失函数
            loss.backward()  # 反向传播
            optimizer.step()  # 参数更新

            # 累加训练损失值
            running_loss += loss.item()
            # if step%100==99:  # 每 100 个 step 打印一次训练信息
            #     print("\t epoch {}, step {}: loss = {:.4f}".format(epoch, step, loss.item()))

        # 计算每个轮次的验证集准确率
        with torch.no_grad():  # 验证过程, 不计算损失函数梯度
            outputs_valid = model(valid_images.to(device))  # 模型对验证集进行推理, [batch, 10]
        pred_labels = torch.max(outputs_valid, dim=1)[1]  # 预测类别, [batch]
        accuracy = torch.eq(pred_labels, valid_labels.to(device)).sum().item() / valid_size * 100  # 计算准确率
        print("Epoch {}: train loss={:.4f}, accuracy={:.2f}%".format(epoch, running_loss, accuracy))

        # 记录训练过程的统计数据
        epoch_list.append(epoch)  # 记录迭代次数
        loss_list.append(running_loss)  # 记录训练集的损失函数
        accu_list.append(accuracy)  # 记录验证集的准确率

The result of the program running is as follows:

Epoch 0: train loss=717.2183, accuracy=44.00%
Epoch 1: train loss=556.0241, accuracy=55.60%
Epoch 2: train loss=473.6305, accuracy=63.00%
…
Epoch 97: train loss=87.7806, accuracy=90.90%
Epoch 98: train loss=90.4091, accuracy=89.60%
Epoch 99: train loss=89.8074, accuracy=91.00%

After about 20 rounds of training, using 1000 pictures in the verification set for verification, the accuracy of the model reaches more than 80%. Continued training can further reduce the value of the training loss function, and the accuracy of the validation set reaches about 90%.

insert image description here

3.5 Saving and loading of WideResNet model

After the model is trained, save the model for next use. There are two main ways to save models in PyTorch, one is to save the model weights, and the other is to save the entire model. This example uses the model.state_dict() method to return the model weights as a dictionary, and the torch.save() method serializes the weight dictionary to disk and saves the model as a .pth file.

    # (5) 保存 WideResNet 网络模型
    save_path = "../models/WideResNet_Cifar1"
    model_cpu = model.cpu()  # 将模型移动到 CPU
    model_path = save_path + ".pth"  # 模型文件路径
    torch.save(model.state_dict(), model_path)  # 保存模型权值
    # 优化结果写入数据文件
    result_path = save_path + ".csv"  # 优化结果文件路径
    WriteDataFile(epoch_list, loss_list, accu_list, result_path)

To use the trained model, first instantiate the model class, and then call the load_state_dict() method to load the weight parameters of the model.

    # 训练结果可视化
    plt.figure(figsize=(11, 5))
    plt.suptitle("WideResNet Model in CIFAR10")
    plt.subplot(121), plt.title("Train loss")
    plt.plot(epoch_list, loss_list)
    plt.xlabel('epoch'), plt.ylabel('loss')
    plt.subplot(122), plt.title("Valid accuracy")
    plt.plot(epoch_list, accu_list)
    plt.xlabel('epoch'), plt.ylabel('accuracy')
    plt.show()

Special attention should be paid to:

(1) The .pth file in PyTorch only saves the weight parameters of the model, but does not have the structural information of the model. Therefore, the model object must be instantiated before loading the model parameters.

(2) The model object must strictly correspond to the model parameters before it can be used normally. Note that even if they are all WideResNet models, the specific definitions of the model classes may be slightly different. If the definition of the model class is obtained from one source, and the model parameter file is obtained from another source, it is easy to cause a mismatch between the model structure and the parameters.

(3) Regardless of the model and parameters loaded from the PyTorch model warehouse, or the pre-trained model obtained from other sources, or the model trained by yourself, the method of loading the model is the same, and attention must also be paid to the matching of the model structure and parameters question.

3.6 Model checking

Using the loaded WideResNet model, input a new image for model inference, and the output of the model can determine the category of the input image.

Use the test set data for model inference, and compare the prediction results of the model with the image labels to verify the accuracy of the model. The model validation set and the model testing set cannot be used interchangeably, but in order to simplify the routine, no distinction is made in this procedure.

    # (7) 模型检验
    correct = 0
    total = 0
    for data in test_loader:  # 迭代器加载测试数据集
        imgs, labels = data  # torch.Size([batch,3,224,224]) torch.Size([batch])
        # print(imgs.shape, labels.shape)
        outputs = model(imgs.to(device))  # 正向传播, 模型推理, [batch, 10]
        labels_pred = torch.max(outputs, dim=1)[1]  # 模型预测的类别 [batch]
        # _, labels_pred = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += torch.eq(labels_pred, labels.to(device)).sum().item()
    accuracy = 100. * correct / total
    print("Test samples: {}".format(total))
    print("Test accuracy={:.2f}%".format(accuracy))

Using the test set for model inference, the accuracy of the test model is 90.03%.

Test samples: 10000
Test accuracy=90.03%

3.7 Model Inference

Using the loaded WideResNet model, input a new image for model inference, and the output of the model can determine the category of the input image.

Extract several pictures from the test set, or read image files, perform model inference, and obtain the classification categories of pictures. When extracting pictures or reading files, pay attention to proper conversion of picture format and picture size.

    # (8) 提取测试集图片进行模型推理
    batch = 8  # 批次大小
    data_set = torchvision.datasets.CIFAR10(root='../dataset', train=False,
                                           download=False, transform=None)
    plt.figure(figsize=(9, 6))
    for i in range(batch):
        imgPIL = data_set[i][0]  # 提取 PIL 图片
        label = data_set[i][1]  # 提取 图片标签
        # 预处理/模型推理/后处理
        imgTrans = transform(imgPIL)  # 预处理变换, torch.Size([3, 224, 224])
        imgBatch = torch.unsqueeze(imgTrans, 0)  # 转为批处理，torch.Size([batch=1, 3, 224, 224])
        outputs = model(imgBatch.to(device))  # 模型推理, 返回 [batch=1, 10]
        indexes = torch.max(outputs, dim=1)[1]  # 注意 [batch=1], device = 'device
        index = indexes[0].item()  # 预测类别，整数
        # 绘制第 i 张图片
        imgNP = np.array(imgPIL)  # PIL -> Numpy
        out_text = "label:{}/model:{}".format(classes[label], classes[index])
        plt.subplot(2, 4 ,i+1)
        plt.imshow(imgNP)
        plt.title(out_text)
        plt.axis('off')
    plt.tight_layout()
    plt.show()

The result is as follows.

insert image description here

    # (9) 读取图像文件进行模型推理
    from PIL import Image
    filePath = "../images/img_plane_01.jpg"  # 数据文件的地址和文件名
    imgPIL = Image.open(filePath)  # PIL 读取图像文件, <class 'PIL.Image.Image'>

    # 预处理/模型推理/后处理
    imgTrans = transform["test"](imgPIL)  # 预处理变换, torch.Size([3, 224, 224])
    imgBatch = torch.unsqueeze(imgTrans, 0)  # 转为批处理，torch.Size([batch=1, 3, 224, 224])
    outputs = model(imgBatch.to(device))  # 模型推理, 返回 [batch=1, 10]
    indexes = torch.max(outputs, dim=1)[1]  # 注意 [batch=1], device = 'device
    percentages = nn.functional.softmax(outputs, dim=1)[0] * 100
    index = indexes[0].item()  # 预测类别，整数
    percent = percentages[index].item()  # 预测类别的概率，浮点数

    # 绘制第 i 张图片
    imgNP = np.array(imgPIL)  # PIL -> Numpy
    out_text = "Prediction:{}, {}, {:.2f}%".format(index, classes[index], percent)
    print(out_text)
    plt.imshow(imgNP)
    plt.title(out_text)
    plt.axis('off')
    plt.tight_layout()
    plt.show()

4. Image classification using WideResNet pre-trained model

The Torchvision.models package and Torch Hub not only provide the Wide ResNet model class, but also provide a pre-trained model trained on the ImageNet dataset, which can be directly used for image classification or migration learning.

The complete routine for image classification using the Wide ResNet pretrained model is as follows.

# Begin_WideResNet_3.py
# WideResNet model for beginner with PyTorch
# 加载 WideResNet 预训练模型和参数，对图像进行分类
# Copyright: [email protected]
# Crated: Huang Shan, 2023/06/10

# _*_coding:utf-8_*_
import torch
from torchvision import models
import torchvision.transforms as transforms
from matplotlib import pyplot as plt
import numpy as np

if __name__ == '__main__':

    # (1) 加载 WideResNet/PyTorch 预训练模型
    # torch.hub 方式加载 load WRN-50-2:
    model = torch.hub.load('pytorch/vision:v0.10.0', 'wide_resnet50_2', pretrained=True)
    model.eval()

    # (2) 定义输入图像的预处理变换，将 [0,1] 的 PILImage 转换为 [-1,1] 的Tensor
    transform = transforms.Compose([  # 定义图像变换组合
        transforms.Resize([256,256]),  # 图像大小调整为 (w,h)=(256,256)
        transforms.CenterCrop([224,224]),  # 图像中心裁剪为 (w,h)=(224,224)
        transforms.ToTensor(),  # 将图像转换为张量 Tensor
        transforms.Normalize(  # 对图像进行归一化
            mean=[0.485, 0.456, 0.406],  # 均值
            std=[0.229, 0.224, 0.225]  # 标准差
        )])

    # (3) 加载输入图像并进行预处理
    from PIL import Image
    filePath = "../images/img_car_01.jpg"  # 数据文件的地址和文件名
    imgPIL = Image.open(filePath)  # PIL 读取图像文件, <class 'PIL.Image.Image'>
    # 预处理/模型推理/后处理
    imgTrans = transform(imgPIL)  # 预处理变换, torch.Size([3,224,224])
    input_batch = torch.unsqueeze(imgTrans, 0)  # 转为批处理，torch.Size([batch=1,3,224,224])

    # (4) 模型推理
    with torch.no_grad():
        outputs = model(input_batch)  # 返回所有类别的置信度score，torch.Size([batch, 1000])
    # _, index = torch.max(outputs, 1)  # Top-1 类别的索引，tensor([208])
    # print("index: ", index.item())  # 208 : sports car, sport car

    # (5) 模型输出后处理
    # 读取 ImageNet 文本格式类别名称文件
    with open("../dataset/imagenet_classes.txt") as f:  # 类别名称保存为 txt 文件
        categories = [line.strip() for line in f.readlines()]
    print(type(categories), len(categories))  # <class 'list'> 1000

    # 计算所有类别的概率
    probabilities = torch.nn.functional.softmax(outputs[0], dim=0) * 100  # 所有类别的概率，torch.Size([batch, 1000])
    # 查找 Top-5 类别的索引
    top5_prob, top5_idx = torch.topk(probabilities, 5)  # Top-5 类别的概率和索引, torch.Size([5])
    print("Top-5 possible categories:")
    for i in range(top5_prob.size(0)):
        print(top5_idx[i], categories[top5_idx[i]], top5_prob[i].item())

    # (6) 图像分类结果的可视化
    import cv2
    imgCV = cv2.cvtColor(np.asarray(imgPIL), cv2.COLOR_RGB2BGR)  # PIL 转换为 CV 格式
    out_text = f"{
      
      categories[top5_idx[0]]}, {
      
      top5_prob[0].item():.3f}"  # 类别标签 + 概率
    cv2.putText(imgCV, out_text, (25, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)  # 在图像上添加类别标签
    cv2.imshow('Image classification', imgCV)
    key = cv2.waitKey(0)  # delay=0, 不自动关闭
    cv2.destroyAllWindows()

The result is as follows.

<class 'list'> 1000
Top-5 possible categories:
tensor(436) 436: 'beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon', 49.345211029052734
tensor(656) 656: 'minivan', 31.259532928466797
tensor(581) 581: 'grille, radiator grille', 12.88940715789795
tensor(479) 479: 'car wheel', 2.3268837928771973
tensor(627) 627: 'limousine, limo', 2.073991060256958

insert image description here

references

Sergey Zagoruyko, Nikos Komodakis, Wide Residual Networks, 2016
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition, 2015
Wide ResNet model |PyTorch

【End of this section】

Copyright statement:
Welcome to the "youcans hands-on model" series
. For forwarding, please indicate the original link:
[youcans hands-on model] Wide ResNet model
Copyright 2023 youcans, XUPT
Crated: 2023-07-02