ResNet network analysis and practical cases

The deeper the network, the more information is obtained and the features are richer. But in practice, as the network deepens, the optimization effect gets worse, and the accuracy of test data and training data decreases. As shown in the figure below, the 56-layer and 20-layer neural networks compare the training error and test error. In view
insert image description here
of the problem of deep network degradation, He Yuming and others proposed the residual network (ResNet) in the ImageNet image recognition challenge in 2015. It won the championship and profoundly influenced the design of the later deep neural network.

1 residual block

Suppose F(x) represents a mapping function with only two layers, x is the input, and F(x) is the output. Assuming they have the same dimensions. During the training process, we hope to fit an ideal H(x) (an ideal mapping function from input to output) by modifying w and b in the network. That is, our goal is to modify w and b in the predicted value F(x) so as to approach the real value H(x). If we change our thinking and use F(x) to approximate H(x)-x, then the final output we get becomes F(x)+x (the addition here refers to the addition of elements at the corresponding positions, and also It is element-wise addition), where the structure directly connected from input to output is also called shortcut, and the whole structure is the residual block, the basic module of ResNet.

insert image description here

Assuming the network structure is as shown in the figure below, add a short connection to the 19th layer and bypass the 20th and 21st layers. During the training process of the network, if the effect of the 20th and 21st layers is not good, then their corresponding weight parameters It will keep getting smaller during iterative training, or even set to zero directly, that is, the network has given up the 20th and 21st layers at this time, and directly bypasses them through short connections; if the effect of the 20th and 21st layers is very good, then Their corresponding weight parameters will continue to increase during iterative training, that is to say, let the network choose whether to pay attention to the 20th and 21st layers or to abandon these two layers. So He Kaiming once said: "What effect can ResNet achieve? I dare not say how high it is, but at least it is not worse than the original." In actual scenarios, many such residual blocks are usually made. Even if some of the layers perform poorly, it does not matter. As long as one of the layers performs well, the original performance can be improved.
insert image description here
The residual structure consists of a main branch and a shortcut branch (also known as a short connection). It has two forms: the basic module (BasicBlock) and the bottleneck module (Bottleneck), respectively corresponding to the left half and the right half of the figure below part.

insert image description here
These two structures are used for ResNet18/34 (left picture) and ResNet50/101/152 (right picture) respectively. The bottleneck module in the right picture is to reduce the amount of parameters.

In the bottleneck module, the first 1x1 convolution reduces the number of channels by 1/4, and the second 1x1 convolution increases the number of channels by 4 times. In the official pytorch code implementation, set the hyperparameter expansion=4 to achieve this function
insert image description here

The implementation location of resnet on the pytorch official website is: anaconda\envs\Lib\site-packages\torchvision\models\resnet.py

The bottleneck module replaces two 3x3 convolutional layers with 1x1 + 3x3 + 1x1. The middle 3x3 convolutional layer in the new structure first reduces calculations under a dimensionality reduction 1x1 convolutional layer, and then in another 1x1 convolutional layer. The lower layer has been restored. The first 1x1 convolution reduces the 256-dimensional channel to 64-dimensional, and then restores it through 1x1 convolution at the end. The number of parameters used as a whole: 1x1x256x64 + 3x3x64x64 + 1x1x64x256 = 69632, without using bottleneck, it is two 3x3x256 Convolution, as shown in the figure below, the number of parameters at this time: 3x3x256x256x2 = 1179648, which is 16.94 times that of the former.
insert image description here

2 different layers of the network

Common resnet structures mainly include resnet18, 34, 50, 101 and 152. The following table shows their specific structures:
insert image description here
first look at the leftmost side of the table, all networks are divided into 5 parts, namely: conv1, conv2_x, conv3_x , conv4_x, conv5_x, for example: 101-layer refers to a 101-layer network, first there is a convolution with input 7x7x64, and then after 3 + 4 + 23 + 3 = 33 residual blocks, each residual block is 3 layers , so there are 33 x 3 = 99 layers, and at the end there is a fully connected layer (for classification), so 1 + 99 + 1 = 101 layers, a total of 101 layers of network; let's look at the two 50-layer and 101-layer Column, it can be found that the only difference between them is conv4_x, ResNet50 has 6 blocks, and ResNet101 has 23 blocks, there is a difference of 17 blocks between the two, that is, 17 x 3 = 51 layers.

Note: The network layer only refers to the convolutional or fully connected layer, and the activation layer or Pooling layer is not counted.

Figure 1 below is the symbol description in the figure, and Figure 2 is the structure diagram of resnet18 and resnet34:
insert image description here
insert image description here

In the table, all residual structures are divided into 4 modules, namely conv2_x, conv3_x, conv4_x, conv5_x (different colors are used to indicate different modules in the above figure). Among them, the first layer of residual structure in the three modules of conv3_x, conv4_x, and conv5_x performs 2 times downsampling of the input feature map, and adjusts the channel to the channel required by the residual structure of the next layer. At the same time, their On the shortcut branch (the dotted line in the figure), the 1x1 convolution is used for downsampling and channel adjustment, so that the output of the shortcut branch and the output of the main branch have the same size, so that they can be added directly. For the conv2_x module, it does not perform downsampling (because the maximum pooling layer with a stride of 2 is used before the conv2_x module, so the module does not need to reduce the height and width). It should be noted that for ResNet50/101/
152 For example, the shortcut branch of the first layer residual block of the conv2_x module also uses 1x1 convolution. Its function is to adjust the number of channels (without downsampling) so that the output of the main branch and the output of the shortcut branch are directly related. add. The input size of the shortcut branch of the residual structure is [56, 56, 64], the output size is [56, 56, 256], and the output size of the main branch is also [56, 56, 256], so they can be added directly .

According to the above branch, there are two types of shortcut branches. The first is that the input and output are equal (identity mapping), which is represented by a solid line; the second is that the input is passed through a 1x1 convolutional layer and then output, which is represented by a dotted line. express.

3 Network construction

The implementation location of resnet on the pytorch official website is: anaconda\envs\Lib\site-packages\torchvision\models\resnet.py

3.1 Basic residual block construction

The residual block required for the construction of the ResNet18/34 network: the first residual block (left half) in the first module (ie conv2_x) and the first in the second module (ie conv3_x) are shown in the figure below The specific structure of a residual block (right half). The short connections in the left half are indicated by solid lines, and the input channels of this residual block are the same as the output channels, while the short connections in the right half are indicated by solid lines, and the output channels of this residual block are twice as many as the input channels
insert image description here

It is mainly implemented through the class BasicBlock, the code is as follows:

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channel, out_channel, stride=1, downsample=None, **kwargs):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)

        return out

3.2 Bottleneck module construction

The bottleneck module is used to build a resnet50/101/152 network with many layers: the first residual block (left half) and the second module (ie conv2_x) in the first module (ie conv2_x) are shown in the figure below The specific structure of the first residual block (right half) in conv3_x). The short connections in the left half are indicated by solid lines, and the input channels of this residual block are the same as the output channels, while the short connections in the right half are indicated by solid lines, and the output channels of this residual block are twice as many as the input channels

insert image description here
insert image description here

It is realized by class Bottleneck, the code is as follows:

class Bottleneck(nn.Module):
    """
    注意:原论文中,在虚线残差结构的主分支上,第一个1x1卷积层的步距是2,第二个3x3卷积层步距是1。
    但在pytorch官方实现过程中是第一个1x1卷积层的步距是1,第二个3x3卷积层步距是2,
    这么做的好处是能够在top1上提升大概0.5%的准确率。
    可参考Resnet v1.5 https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch
    """
    expansion = 4

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        super(Bottleneck, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=1, stride=1, bias=False)  # squeeze channels
        self.bn1 = nn.BatchNorm2d(out_channel)
        # -----------------------------------------
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=3, stride=stride, bias=False, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channel)
        # -----------------------------------------
        self.conv3 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel * self.expansion,
                               kernel_size=1, stride=1, bias=False)  # unsqueeze channels
        self.bn3 = nn.BatchNorm2d(out_channel * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out

3.3 Resnet network construction

Build the resnet network through the residual module created above,

insert image description here

The code looks like this:

class ResNet(nn.Module):

    def __init__(self,
                 block,  # 残差块的选择:如果定义ResNet18/34时,就选择基础模块(BasicBlock),如果定义ResNet50/101/152,就使用瓶颈模块(Bottleneck)
                 blocks_num,  # 定义所使用的残差块的数量,它是一个列表参数
                 num_classes=1000,  # 网络的分类个数
                 ):
        super(ResNet, self).__init__()
        self.in_channel = 64
        self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2,
                               padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        # 下面的 self.layer1,self.layer2,self.layer3,self.layer4分别是不同的模块,即对应着上面表格中的conv1,conv2_x,conv3_x,conv4_x,conv5_x中的残差结构
        self.layer1 = self._make_layer(block, 64, blocks_num[0])
        self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # 自适应平均池化(即全局平均池化),它会将输入特征图池化成1x1大小
        # 因为前边经过自适应平均池化后特征图大小变为1x1,并且有512 * block.expansion个通道,所以展平后的维度是512 * block.expansion,所以下面的全连接层的输入维度是512 * block.expansion
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        #对卷积层的参数进行初始化
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    def _make_layer(self, block, channel, block_num, stride=1):
        downsample = None
        if stride != 1 or self.in_channel != channel * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channel * block.expansion))

        layers = []
        layers.append(block(self.in_channel,
                            channel,
                            downsample=downsample,
                            stride=stride))
        self.in_channel = channel * block.expansion

        for _ in range(1, block_num):
            layers.append(block(self.in_channel,
                                channel))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)


        x = self.avgpool(x) # 自适应平均池化(即全局平均池化),它会将输入特征图池化成1x1大小
        x = torch.flatten(x, 1)
        x = self.fc(x)

        return x

3.4 Building different layers of networks

When building different layer networks, set the number of residual blocks of conv2_x, conv3_x, conv4_x, conv5_x according to the following table, and specify whether to use the bottleneck module or the basic residual module. Here we built resnet18/34/50
insert image description here
/ Networks of different layers such as 101/152

def resnet18(num_classes=1000):
    #预训练权重下载链接:  https://download.pytorch.org/models/resnet18-5c106cde.pth
    return ResNet(BasicBlock, [2, 2, 2, 2], num_classes=num_classes)


def resnet34(num_classes=1000):
    #预训练权重下载链接: https://download.pytorch.org/models/resnet34-333f7ec4.pth
    return ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes)


def resnet50(num_classes=1000):
    #预训练权重下载链接: https://download.pytorch.org/models/resnet50-19c8e357.pth
    return ResNet(Bottleneck, [3, 4, 6, 3], num_classes=num_classes)


def resnet101(num_classes=1000, include_top=True):
    #预训练权重下载链接: https://download.pytorch.org/models/resnet101-5d3b4d8f.pth
    return ResNet(Bottleneck, [3, 4, 23, 3], num_classes=num_classes)


def resnet152(num_classes=1000):
    #预训练权重下载链接: https://download.pytorch.org/models/resnet152-b121ed2d.pth
    return ResNet(Bottleneck, [3, 8, 36, 3], num_classes=num_classes)

3.5 Overall code

import torch.nn as nn
import torch


class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channel, out_channel, stride=1, downsample=None, **kwargs):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    """
    注意:原论文中,在虚线残差结构的主分支上,第一个1x1卷积层的步距是2,第二个3x3卷积层步距是1。
    但在pytorch官方实现过程中是第一个1x1卷积层的步距是1,第二个3x3卷积层步距是2,
    这么做的好处是能够在top1上提升大概0.5%的准确率。
    可参考Resnet v1.5 https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch
    """
    expansion = 4

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        super(Bottleneck, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=1, stride=1, bias=False)  # squeeze channels
        self.bn1 = nn.BatchNorm2d(out_channel)
        # -----------------------------------------
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=3, stride=stride, bias=False, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channel)
        # -----------------------------------------
        self.conv3 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel * self.expansion,
                               kernel_size=1, stride=1, bias=False)  # unsqueeze channels
        self.bn3 = nn.BatchNorm2d(out_channel * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out


class ResNet(nn.Module):

    def __init__(self,
                 block,  # 残差块的选择:如果定义ResNet18/34时,就选择基础模块(BasicBlock),如果定义ResNet50/101/152,就使用瓶颈模块(Bottleneck)
                 blocks_num,  # 定义所使用的残差块的数量,它是一个列表参数
                 num_classes=1000,  # 网络的分类个数
                 ):
        super(ResNet, self).__init__()
        self.in_channel = 64
        self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2,
                               padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        # 下面的 self.layer1,self.layer2,self.layer3,self.layer4分别是不同的模块,即对应着上面表格中的conv1,conv2_x,conv3_x,conv4_x,conv5_x中的残差结构
        self.layer1 = self._make_layer(block, 64, blocks_num[0])
        self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # 自适应平均池化(即全局平均池化),它会将输入特征图池化成1x1大小
        # 因为前边经过自适应平均池化后特征图大小变为1x1,并且有512 * block.expansion个通道,所以展平后的维度是512 * block.expansion,所以下面的全连接层的输入维度是512 * block.expansion
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        #对卷积层的参数进行初始化
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    def _make_layer(self, block, channel, block_num, stride=1):
        downsample = None
        if stride != 1 or self.in_channel != channel * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channel * block.expansion))

        layers = []
        layers.append(block(self.in_channel,
                            channel,
                            downsample=downsample,
                            stride=stride))
        self.in_channel = channel * block.expansion

        for _ in range(1, block_num):
            layers.append(block(self.in_channel,
                                channel))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)


        x = self.avgpool(x) # 自适应平均池化(即全局平均池化),它会将输入特征图池化成1x1大小
        x = torch.flatten(x, 1)
        x = self.fc(x)

        return x


def resnet18(num_classes=1000):
    #预训练权重下载链接:  https://download.pytorch.org/models/resnet18-5c106cde.pth
    return ResNet(BasicBlock, [2, 2, 2, 2], num_classes=num_classes)


def resnet34(num_classes=1000):
    #预训练权重下载链接: https://download.pytorch.org/models/resnet34-333f7ec4.pth
    return ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes)


def resnet50(num_classes=1000):
    #预训练权重下载链接: https://download.pytorch.org/models/resnet50-19c8e357.pth
    return ResNet(Bottleneck, [3, 4, 6, 3], num_classes=num_classes)


def resnet101(num_classes=1000, include_top=True):
    #预训练权重下载链接: https://download.pytorch.org/models/resnet101-5d3b4d8f.pth
    return ResNet(Bottleneck, [3, 4, 23, 3], num_classes=num_classes)


def resnet152(num_classes=1000):
    #预训练权重下载链接: https://download.pytorch.org/models/resnet152-b121ed2d.pth
    return ResNet(Bottleneck, [3, 8, 36, 3], num_classes=num_classes)




3.6 Resnet code officially packaged by pytorch

In fact, the resnet code has been packaged officially in pytorch, we only need one line of code to call it, no need to build the model by ourselves

resnet18

  • Import the pre-trained model of resnet18
    import torchvision
    model = torchvision.models.resnet18(pretrained=True)
  • If you only need the resnet18 network structure and do not need to initialize with the parameters of the pre-trained model, then it is
    model = torchvision.models.resnet50(pretrained=False)

serious50

  • Import the pre-trained model of resnet50
    import torchvision
    model = torchvision.models.resnet50(pretrained=True)
  • If you only need the resnet50 network structure and do not need to initialize with the parameters of the pre-trained model, then it is
    model = torchvision.models.resnet50(pretrained=False)

resnet101

  • Import the pre-trained model of resnet101
    import torchvision
    model = torchvision.models.resnet101(pretrained=True)
  • If you only need the resnet50 network structure and do not need to initialize with the parameters of the pre-trained model, then it is
    model = torchvision.models.resnet101(pretrained=False)

resnet152

  • Import the pre-trained model of resnet152
    import torchvision
    model = torchvision.models.resnet152(pretrained=True)
  • If you only need the resnet50 network structure and do not need to initialize with the parameters of the pre-trained model, then it is
    model = torchvision.models.resnet152(pretrained=False)

4 fine-tuning Resnet-18 for two classifications

Next, we use the pre-trained Resnet-18 on Imagenet to perform Finetune (fine-tuning) for the binary classification
model pre-training weight download address: https://download.pytorch.org/models/resnet18-5c106cde.pth

The ants and bees binary classification data set includes:
Training set: 120 ~ pieces each Verification set: 70 ~ pieces each
The pictures of each category are stored in different folders, and the folder name is the label name. The amount of data here is very small, so you can only fine-tune the model
Dataset download address: https://download.pytorch.org/tutorial/hymenoptera_data.zip

The part of fine-tuning here is the last fully connected layer, changing the original 1000 neurons to 2 neurons, and all the
insert image description here
codes, datasets and model weights for the binary classification project have been placed in my github warehouse:https://github.com/mojieok/classification

  • overall arrangement
    insert image description here
  • Customize the AntsDataset class, which inherits the torch.utils.data.Dataset class, and its location is in the tools/my_dataset file
import numpy as np
import torch
import os
import random
from PIL import Image
from torch.utils.data import Dataset

class AntsDataset(Dataset):
    def __init__(self, data_dir, transform=None):
        #每个类别的图片分别存放在不同的文件夹中,并且该文件夹名就是标签名
        self.label_name = {"ants": 0, "bees": 1}#获取标签名称
        self.data_info = self.get_img_info(data_dir)#data_info是一个List,里边存放图片的位置以及标签
        self.transform = transform

    def __getitem__(self, index):
        path_img, label = self.data_info[index]
        img = Image.open(path_img).convert('RGB')

        if self.transform is not None:
            img = self.transform(img)

        return img, label

    def __len__(self):
        return len(self.data_info)#返回数据集的样本总数

    def get_img_info(self, data_dir):#data_dir是数据所在文件夹
        data_info = list()
        for root, dirs, _ in os.walk(data_dir):#遍历数据所在文件夹
            # 遍历类别
            for sub_dir in dirs:
                img_names = os.listdir(os.path.join(root, sub_dir))
                img_names = list(filter(lambda x: x.endswith('.jpg'), img_names))

                # 遍历图片
                for i in range(len(img_names)):
                    img_name = img_names[i]
                    path_img = os.path.join(root, sub_dir, img_name)
                    label = self.label_name[sub_dir]
                    data_info.append((path_img, int(label)))

        if len(data_info) == 0:
            #判断data_dir文件夹中是否有图片,如果没有就抛出异常
            raise Exception("\ndata_dir:{} is a empty dir! Please checkout your path to images!".format(data_dir))
        return data_info


  • The implementation of the random number seed function, its location is in the tools/common_tools file
import torch
import random
import numpy as np
def set_seed(seed=1):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
  • Model training, its location is ./finetune_resnet18
import os
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import torchvision.transforms as transforms
import torch.optim as optim
from matplotlib import pyplot as plt
from tools.my_dataset import AntsDataset
from tools.common_tools import set_seed
import torchvision.models as models
import torchvision
BASEDIR = os.path.dirname(os.path.abspath(__file__))
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("use device :{}".format(device))

set_seed(1)  # 设置随机种子
label_name = {"ants": 0, "bees": 1}

# 参数设置
MAX_EPOCH = 25
BATCH_SIZE = 16
LR = 0.001
log_interval = 10
val_interval = 1
classes = 2
start_epoch = -1
lr_decay_step = 7


# ============================ step 1/5 数据 ============================
data_dir = os.path.join(BASEDIR, "data")
train_dir = os.path.join(data_dir, "train")
valid_dir = os.path.join(data_dir, "val")

norm_mean = [0.485, 0.456, 0.406]
norm_std = [0.229, 0.224, 0.225]

train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(norm_mean, norm_std),
])

valid_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(norm_mean, norm_std),
])

# 构建MyDataset实例
train_data = AntsDataset(data_dir=train_dir, transform=train_transform)
valid_data = AntsDataset(data_dir=valid_dir, transform=valid_transform)

# 构建DataLoder
train_loader = DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
valid_loader = DataLoader(dataset=valid_data, batch_size=BATCH_SIZE)

# ============================ step 2/5 模型 ============================

# 1/3 构建模型
resnet18_ft = models.resnet18()#通过torchvision.models构建预训练模型resnet18

# 2/3 加载参数
# flag = 0
flag = 1
if flag:
    path_pretrained_model = os.path.join(BASEDIR,  "data/resnet18-5c106cde.pth")
    state_dict_load = torch.load(path_pretrained_model)
    resnet18_ft.load_state_dict(state_dict_load)#将参数加载到模型中

# 法1 : 冻结卷积层(它适用于当前任务数据量比较小,不足以训练卷积层,而只对最后的全连接层进行训练)
flag_m1 = 1
# flag_m1 = 1
if flag_m1:
    for param in resnet18_ft.parameters():
        param.requires_grad = False
    # 打印第一个卷积层的卷积核参数,由输出结果可知,因为冻结了卷积层的参数,所以每次迭代时打印的卷积层的参数都不发生变化
   # print("conv1.weights[0, 0, ...]:\n {}".format(resnet18_ft.conv1.weight[0, 0, ...]))


# 3/3 替换fc层
num_ftrs = resnet18_ft.fc.in_features#首先需要获取原模型的最后的全连接层的输入大小
resnet18_ft.fc = nn.Linear(num_ftrs, classes)#然后使用自己定义好的全连接层替换原来的输出层(即最后的全连接层),因为当前任务是2分类,所以classes=2


resnet18_ft.to(device)
# ============================ step 3/5 损失函数 ============================
criterion = nn.CrossEntropyLoss()                                                   # 选择损失函数

# ============================ step 4/5 优化器 ============================
# 法2 : 给卷积层设置较小的学习率
# flag = 1
flag = 0
if flag:
    #获取最后的全连接层的参数地址,将它们存储成列表的形式,列表中的每个元素对应着每个参数的地址
    fc_params_id = list(map(id, resnet18_ft.fc.parameters()))     # 返回的是parameters的 内存地址
    #过滤掉resnet18中最后的全连接层的参数
    base_params = filter(lambda p: id(p) not in fc_params_id, resnet18_ft.parameters())
    #通过上面两行代码,我们就可以分别获取resnet18的卷积层和全连接层,然后对这两部分分别设置不同的学习率
    optimizer = optim.SGD([
        {'params': base_params, 'lr': LR*0.1},   #卷积层设置的学习率是原始学习率的0.1倍
        #{'params': base_params, 'lr': LR * 0},  #也可以将卷积层的学习率设置为0,这样就相当于固定卷积层不训练
        {'params': resnet18_ft.fc.parameters(), 'lr': LR}], momentum=0.9)

else:
    optimizer = optim.SGD(resnet18_ft.parameters(), lr=LR, momentum=0.9)               # 选择优化器

scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=lr_decay_step, gamma=0.1)     # 设置学习率下降策略


# ============================ step 5/5 训练 ============================
train_curve = list()
valid_curve = list()

for epoch in range(start_epoch + 1, MAX_EPOCH):

    loss_mean = 0.
    correct = 0.
    total = 0.

    resnet18_ft.train()
    for i, data in enumerate(train_loader):

        # forward
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = resnet18_ft(inputs)

        # backward
        optimizer.zero_grad()
        loss = criterion(outputs, labels)
        loss.backward()

        # update weights
        optimizer.step()

        # 统计分类情况
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).squeeze().cpu().sum().numpy()

        # 打印训练信息
        loss_mean += loss.item()
        train_curve.append(loss.item())
        if (i+1) % log_interval == 0:
            loss_mean = loss_mean / log_interval
            print("Training:Epoch[{:0>3}/{:0>3}] Iteration[{:0>3}/{:0>3}] Loss: {:.4f} Acc:{:.2%}".format(
                epoch, MAX_EPOCH, i+1, len(train_loader), loss_mean, correct / total))
            loss_mean = 0.

            # if flag_m1:
            #print("epoch:{} conv1.weights[0, 0, ...] :\n {}".format(epoch, resnet18_ft.conv1.weight[0, 0, ...]))

    scheduler.step()  # 更新学习率
    
    #保存模型权重
    checkpoint = {"model_state_dict": resnet18_ft.state_dict(),
                  "optimizer_state_dict": optimizer.state_dict(),
                  "epoch": epoch}
    PATH = f'./checkpoint_{epoch}_epoch.pkl'
    torch.save(checkpoint,PATH)


    # validate the model
    if (epoch+1) % val_interval == 0:

        correct_val = 0.
        total_val = 0.
        loss_val = 0.
        resnet18_ft.eval()
        with torch.no_grad():
            for j, data in enumerate(valid_loader):
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)

                outputs = resnet18_ft(inputs)
                loss = criterion(outputs, labels)

                _, predicted = torch.max(outputs.data, 1)
                total_val += labels.size(0)
                correct_val += (predicted == labels).squeeze().cpu().sum().numpy()

                loss_val += loss.item()

            loss_val_mean = loss_val/len(valid_loader)
            valid_curve.append(loss_val_mean)
            print("Valid:\t Epoch[{:0>3}/{:0>3}] Iteration[{:0>3}/{:0>3}] Loss: {:.4f} Acc:{:.2%}".format(
                epoch, MAX_EPOCH, j+1, len(valid_loader), loss_val_mean, correct_val / total_val))
        resnet18_ft.train()

train_x = range(len(train_curve))
train_y = train_curve

train_iters = len(train_loader)
valid_x = np.arange(1, len(valid_curve)+1) * train_iters*val_interval # 由于valid中记录的是epochloss,需要对记录点进行转换到iterations
valid_y = valid_curve

plt.plot(train_x, train_y, label='Train')
plt.plot(valid_x, valid_y, label='Valid')

plt.legend(loc='upper right')
plt.ylabel('loss value')
plt.xlabel('Iteration')
plt.show()


  • Model prediction, its location is in ./resnet_inference
import os
import time
import torch.nn as nn
import torch
import torchvision.transforms as transforms
from PIL import Image
from matplotlib import pyplot as plt
import torchvision.models as models
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = torch.device("cpu")

# config
vis = True
# vis = False
vis_row = 4

norm_mean = [0.485, 0.456, 0.406]
norm_std = [0.229, 0.224, 0.225]

inference_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(norm_mean, norm_std),
])

classes = ["ants", "bees"]


def img_transform(img_rgb, transform=None):
    """
    将数据转换为模型读取的形式
    :param img_rgb: PIL Image
    :param transform: torchvision.transform
    :return: tensor
    """

    if transform is None:
        raise ValueError("找不到transform!必须有transform对img进行处理")

    img_t = transform(img_rgb)
    return img_t


def get_img_name(img_dir, format="jpg"):
    """
    获取文件夹下format格式的文件名
    :param img_dir: str
    :param format: str
    :return: list
    """
    file_names = os.listdir(img_dir)
    img_names = list(filter(lambda x: x.endswith(format), file_names))

    if len(img_names) < 1:
        raise ValueError("{}下找不到{}格式数据".format(img_dir, format))
    return img_names


def get_model(m_path, vis_model=False):

    resnet18 = models.resnet18()
    num_ftrs = resnet18.fc.in_features
    resnet18.fc = nn.Linear(num_ftrs, 2)

    checkpoint = torch.load(m_path)
    resnet18.load_state_dict(checkpoint['model_state_dict'])

    if vis_model:
        from torchsummary import summary
        summary(resnet18, input_size=(3, 224, 224), device="cpu")

    return resnet18


if __name__ == "__main__":

    img_dir = os.path.join( "data/val/bees")
    model_path = "./checkpoint_14_epoch.pkl"
    time_total = 0
    img_list, img_pred = list(), list()

    # 1. data
    img_names = get_img_name(img_dir)
    num_img = len(img_names)

    # 2. model
    resnet18 = get_model(model_path, True)
    resnet18.to(device)
    resnet18.eval()#在模型预测阶段,一定要使用函数eval()将模型的状态设置为预测状态,而不是训练状态

    with torch.no_grad(): #在模型预测阶段,一定要使用 with torch.no_grad()设置模型不去计算梯度,所以就不用保存这些梯度,这样可以既提高运算速度,又节省了显存
        for idx, img_name in enumerate(img_names):

            path_img = os.path.join(img_dir, img_name)

            # step 1/4 :将图像转化为RGB格式
            img_rgb = Image.open(path_img).convert('RGB')

            # step 2/4 : 将RGB图像转化为张量的形式
            img_tensor = img_transform(img_rgb, inference_transform)
            #增加一个batch维度,将3维张量转化为4维张量
            img_tensor.unsqueeze_(0)
            img_tensor = img_tensor.to(device)

            # step 3/4 : 将张量送入模型进行运算
            time_tic = time.time()
            outputs = resnet18(img_tensor)
            time_toc = time.time()

            # step 4/4 : visualization
            _, pred_int = torch.max(outputs.data, 1)
            pred_str = classes[int(pred_int)]

            if vis:
                img_list.append(img_rgb)
                img_pred.append(pred_str)

                if (idx+1) % (vis_row*vis_row) == 0 or num_img == idx+1:
                    for i in range(len(img_list)):
                        plt.subplot(vis_row, vis_row, i+1).imshow(img_list[i])
                        plt.title("predict:{}".format(img_pred[i]))
                    plt.show()
                    plt.close()
                    img_list, img_pred = list(), list()

            time_s = time_toc-time_tic
            time_total += time_s

            print('{:d}/{:d}: {} {:.3f}s '.format(idx + 1, num_img, img_name, time_s))

    print("\ndevice:{} total time:{:.1f}s mean:{:.3f}s".
          format(device, time_total, time_total/num_img))
    if torch.cuda.is_available():
        print("GPU name:{}".format(torch.cuda.get_device_name()))

Guess you like

Origin blog.csdn.net/m0_56192771/article/details/124229267