Neural network learning short record 59 - Pytorch builds common classification network platforms (VGG16, MobileNetV2, ResNet50)

Preface to study
Source code download
Common forms of classification networks
Introduction to classification networks
Training of classification network

Preface to study

I discovered that I have done so many blogs and videos, but I have never systematically made a classification network. Building a scientific classification network is good for your health.
Insert image description here

Source code download

https://github.com/bubbliiiiing/classification-pytorch
If you like it, you can give it a star.

Common forms of classification networks

Common classification networks can be divided into two parts, one is the feature extraction part and the other is the classification part. Insert image description here
The function of the feature extraction part is to extract features from the input image , Excellent features can help distinguish targets more easily, so feature extraction The part is generally composed of various types of convolutions, which have powerful feature extraction capabilities;

The classification part will use the features obtained by thefeature extraction part forclassification< a i=4>, the classification part is generally composed of fully connected, The features obtained by the feature extraction part are generally one-dimensional vectors, which can be directly used for fully connected classification .
Insert image description here
Usually, the feature extraction part is the various neural networks we usually know, such as VGG, Mobilenet, Resnet, etc.; and the classification part is one or several full connections. Finally, we A one-dimensional vector of length num_classes will be obtained.

Introduction to classification networks

1. Introduction to VGG16 network

VGG is a convolutional neural network model proposed by Simonyan and Zisserman in the document "Very Deep Convolutional Networks for Large Scale Image Recognition". Its name comes from the Visual Geometry Group of the University of Oxford where the author works. abbreviation.
This model participated in the 2014 ImageNet Image Classification and Positioning Challenge and achieved excellent results: ranking second in the classification task and first in the positioning task.
Its structure is shown in the figure below:
Insert image description here
This is a picture of VGG16 being used badly, but it does reflect the structure of VGG16 very well. The entire VGG16 It consists of three different layers, namely convolutional layer, maximum pooling layer, and fully connected layer.
The specific execution method of VGG16 is as follows:
1. An original image is resized to (224,224,3).
2. conv1: Perform two [3,3] convolutional networks, the output feature layer is 64, the output is (224,224,64), and then perform 2X2 maximum pooling, the output net is (112,112,64).
3. conv2: Perform two [3,3] convolutional networks, the output feature layer is 128, the output net is (112,112,128), and then perform 2X2 maximum pooling, the output net is ( 56,56,128).
4. conv3: Perform three times [3,3] convolution network, the output feature layer is 256, the output net is (56,56,256), and then perform 2X2 maximum pooling, the output net is (28,28,256).
5. conv4: Perform three times [3,3] convolution network, the output feature layer is 512, the output net is (28,28,512), and then perform 2X2 maximum pooling, the output net is (14,14,512).
6. conv5: Perform three times [3,3] convolution network, the output feature layer is 512, the output net is (14,14,512), and then perform 2X2 maximum pooling, the output net is (7,7,512).
7. Tile the results.
8. Perform two fully connected layers with 4096 neurons.
8. Fully connected to 1000 dimensions for classification.
The final output is the prediction of each class.

The implementation code is as follows:

import torchvision
import torch
import torch.nn as nn
from torchvision.models.utils import load_state_dict_from_url

model_urls = {
    
    
    'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth',
}


class VGG(nn.Module):
    def __init__(self, features, num_classes=1000, init_weights=True):
        super(VGG, self).__init__()
        self.features = features
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)


def make_layers(cfg, batch_norm=False):
    layers = []
    in_channels = 3
    for v in cfg:
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
                layers += [conv2d, nn.ReLU(inplace=True)]
            in_channels = v
    return nn.Sequential(*layers)

cfgs = {
    
    
    'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
}

def vgg16(pretrained=False, progress=True, num_classes=1000):
    model = VGG(make_layers(cfgs['D']))
    if pretrained:
        state_dict = load_state_dict_from_url(model_urls['vgg16'], model_dir='./model_data',
                                              progress=progress)
        model.load_state_dict(state_dict,strict=False)

    if num_classes!=1000:
        model.classifier =  nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )
    return model

2. Introduction to MobilenetV2 network

MobileNetV2 is an upgraded version of MobileNet. A very important feature of it is the use of Inverted resblock. The entire mobilenetv2 is composed of Inverted resblock.

Inverted resblock can be divided into two parts:
The left side is the main part. First use 1x1 convolution to increase the dimension, and then use 3x3 depth Separable convolution is used for feature extraction, and then 1x1 convolution is used to reduce dimensionality.
The right side is the residual edge part, the input and output are directly connected .
Insert image description here

The overall network structure is as follows: (The operation performed by Inverted resblock is the above structure)
Insert image description here

After using the feature extraction part to complete the feature extraction of the input image, we will use global average pooling to adjust the feature layer into a feature strip. We can fully connect the feature strip to obtain the final classification result.

The implementation code is as follows:

from torch import nn
from torchvision.models.utils import load_state_dict_from_url


__all__ = ['MobileNetV2', 'mobilenet_v2']


model_urls = {
    
    
    'mobilenet_v2': 'https://download.pytorch.org/models/mobilenet_v2-b0353104.pth',
}


def _make_divisible(v, divisor, min_value=None):
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

class ConvBNReLU(nn.Sequential):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
            nn.BatchNorm2d(out_planes),
            nn.ReLU6(inplace=True)
        )

class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        hidden_dim = int(round(inp * expand_ratio))
        self.use_res_connect = self.stride == 1 and inp == oup

        layers = []
        if expand_ratio != 1:
            layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1))
        layers.extend([
            ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim),
            nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
            nn.BatchNorm2d(oup),
        ])
        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNetV2(nn.Module):
    def __init__(self, num_classes=1000, width_mult=1.0, inverted_residual_setting=None, round_nearest=8):
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        input_channel = 32
        last_channel = 1280

        if inverted_residual_setting is None:
            inverted_residual_setting = [
                # t, c, n, s
                [1, 16, 1, 1],
                [6, 24, 2, 2],
                [6, 32, 3, 2],
                [6, 64, 4, 2],
                [6, 96, 3, 1],
                [6, 160, 3, 2],
                [6, 320, 1, 1],
            ]

        if len(inverted_residual_setting) == 0 or len(inverted_residual_setting[0]) != 4:
            raise ValueError("inverted_residual_setting should be non-empty "
                             "or a 4-element list, got {}".format(inverted_residual_setting))

        input_channel = _make_divisible(input_channel * width_mult, round_nearest)
        self.last_channel = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)
        features = [ConvBNReLU(3, input_channel, stride=2)]

        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * width_mult, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(block(input_channel, output_channel, stride, expand_ratio=t))
                input_channel = output_channel

        features.append(ConvBNReLU(input_channel, self.last_channel, kernel_size=1))
        self.features = nn.Sequential(*features)

        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(self.last_channel, num_classes),
        )

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x):
        x = self.features(x)
        x = x.mean([2, 3])
        x = self.classifier(x)
        return x


def mobilenet_v2(pretrained=False, progress=True, num_classes=1000):
    model = MobileNetV2()
    if pretrained:
        state_dict = load_state_dict_from_url(model_urls['mobilenet_v2'], model_dir='./model_data',
                                              progress=progress)
        model.load_state_dict(state_dict)

    if num_classes!=1000:
        model.classifier = nn.Sequential(
                nn.Dropout(0.2),
                nn.Linear(model.last_channel, num_classes),
            )
    return model

3. Introduction to ResNet50 network

a. What is a residual network?

Residual net (residual network):
Output the data of a certain layer of the first several layers directlySkip multiple layers and introduce the input part of the subsequent data layer.
means that part of the content of the feature layer behind will be linearly contributed by a certain layer before it.
Its structure is as follows:
Insert image description here
The design of the deep residual network is to overcome the problems of low learning efficiency and inability to effectively improve the accuracy due to the deepening of the network depth. .

b. What is ResNet50 model?

ResNet50 has two basic blocks, named Conv Block and Identity Block. The input and output dimensions of Conv Block are different, so they cannot be connected in series. The function of Identity Block is to change the dimension of the network; the input dimension and output dimension of Identity Block are the same and can be connected in series. Its function is to deepen the network.
The structure of Conv Block is as follows. As can be seen from the figure, Conv Block can be divided into two parts. The left part is the trunk part, and there are two convolutions. , standardization, activation function and one-time convolution and standardization; the right part is the residual edge part, and there is one-time convolution and normalization. Since there is convolution in the residual edge part, we can use Conv Block changes the width, height and number of channels of the output feature layer:
Insert image description here
The structure of the Identity Block is as follows. As can be seen from the figure, the Identity Block can be divided into two parts, The left part is the trunk part, with two convolutions, standardization, activation functions and one convolution and standardization; the right part is the residual side part, directly connected to the output, due to the residual There is no convolution in the edge part, so the input feature layer and output feature layer of the Identity Block have the same shape, which can be used to deepen the network:
Insert image description here

Conv Block and Identity Block are both residual network structures.

The overall network structure is as follows:
Insert image description here
The implementation code is as follows:

import torch
import torch.nn as nn
from torchvision.models.utils import load_state_dict_from_url

model_urls = {
    
    
    'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
}


def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=dilation, groups=groups, bias=False, dilation=dilation)


def conv1x1(in_planes, out_planes, stride=1):
    return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)



class Bottleneck(nn.Module):
    expansion = 4
    def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
                 base_width=64, dilation=1, norm_layer=None):
        super(Bottleneck, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        width = int(planes * (base_width / 64.)) * groups
        # Both self.conv2 and self.downsample layers downsample the input when stride != 1
        self.conv1 = conv1x1(inplanes, width)
        self.bn1 = norm_layer(width)
        self.conv2 = conv3x3(width, width, stride, groups, dilation)
        self.bn2 = norm_layer(width)
        self.conv3 = conv1x1(width, planes * self.expansion)
        self.bn3 = norm_layer(planes * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes=1000, zero_init_residual=False,
                 groups=1, width_per_group=64, replace_stride_with_dilation=None,
                 norm_layer=None):
        super(ResNet, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self._norm_layer = norm_layer

        self.inplanes = 64
        self.dilation = 1
        if replace_stride_with_dilation is None:
            replace_stride_with_dilation = [False, False, False]

        if len(replace_stride_with_dilation) != 3:
            raise ValueError("replace_stride_with_dilation should be None "
                             "or a 3-element tuple, got {}".format(replace_stride_with_dilation))
        self.groups = groups
        self.base_width = width_per_group
        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = norm_layer(self.inplanes)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
                                       dilate=replace_stride_with_dilation[0])
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
                                       dilate=replace_stride_with_dilation[1])
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
                                       dilate=replace_stride_with_dilation[2])
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

        if zero_init_residual:
            for m in self.modules():
                if isinstance(m, Bottleneck):
                    nn.init.constant_(m.bn3.weight, 0)

    def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
        norm_layer = self._norm_layer
        downsample = None
        previous_dilation = self.dilation
        if dilate:
            self.dilation *= stride
            stride = 1
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),
                norm_layer(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
                            self.base_width, previous_dilation, norm_layer))
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes, groups=self.groups,
                                base_width=self.base_width, dilation=self.dilation,
                                norm_layer=norm_layer))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)

        return x

def resnet50(pretrained=False, progress=True, num_classes=1000):
    model = ResNet(Bottleneck, [3, 4, 6, 3])
    if pretrained:
        state_dict = load_state_dict_from_url(model_urls['resnet50'], model_dir='./model_data',
                                              progress=progress)
        model.load_state_dict(state_dict)

    if num_classes!=1000:
        model.fc = nn.Linear(512 * model.block.expansion, num_classes)
    return model

Training of classification network

1. Introduction to LOSS

Generally speaking, the loss function used by the classification network is the cross-entropy loss function, which is called Cross Entropy in English. The implementation formula is as follows.
Insert image description here
where:

[ $M$ ] ——The number of categories;
[ $y_{ic}$ ] ——True label (0 or 1), when the i-th sample belongs to class c, the value is 1, otherwise it is 0;
[ $p_{ic}$ ] ——Prediction result, the predicted probability that the i-th sample belongs to class c;
[ $i$ ] - indicates the sample number.

2. Use classification network for training

First go to Github to download the corresponding warehouse. After downloading, use decompression software to decompress it, and then use programming software to open the folder.
Note that the opened root directory must be correct, otherwise the code will not run if the relative directory is incorrect.
Be sure to note that the root directory after opening is the directory where the file is stored.
Insert image description here

a. Preparation of data set

The datasets folder stores training images, which are divided into two parts. Train contains training images, and test contains test images.
Insert image description here
Before training, you need to prepare the data set first. The data set format is divided into different folders under the train and test folders. The name of each folder is the corresponding category name. The folder The pictures below are pictures of this category.

Insert image description here

b. Data set processing

After preparing the data set, you need to run txt_annotation.py in the root directory to generate the cls_train.txt required for training.

Before running, you need to modify the classes in it and modify them into the classes you need to classify. Insert image description here

c. Start network training

Through txt_annotation.py we have generated cls_train.txt and cls_test.txt, and now we can start training.

has many training parameters. You can read the comments carefully after downloading the library. The most important part is to modify the cls_classes.txt under the model_data folder so that it also corresponds to the classes you need to classify.
Insert image description here
After adjusting the network and weights you want to choose in train.py, you can start training!