Reproductive Resnet Residual Network

Problems with deep networks

The deeper the network, the more information can be obtained, and the richer the features. However, according to experiments, as the network deepens, the optimization effect becomes worse, and the accuracy of test data and training data decreases. This is because the deepening of the network will cause the problem of gradient explosion and gradient disappearance.
Insert picture description hereThere is a solution to this phenomenon: normalize the input data and the data of the middle layer. This method can ensure that the network uses stochastic gradient descent (SGD) in the back propagation, so that the network reaches convergence. However, this method is only useful for dozens of layers of network. When the network goes deeper, this method is useless.

Residual Module

Insert picture description here
This is somewhat similar to the "short circuit" in the circuit, so it is a short-circuit connection (shortcut connection), the residual block is also called shortcut. Its principle is: for a stacked layer structure (made of several layers), when the input is x, the learned feature is recorded as H (x), and now we hope that it can learn the residual F (x) = H (x) -x, so the original learning feature is F (x) + x. This is because residual learning is easier than direct feature learning. When the residual is 0, at this time, the stacked layer only does identity mapping, at least the network performance will not decrease, in fact, the residual will not be 0, which will also make the stacked layer learn new features based on the input features So as to have better performance.

In general, the neural network of this layer can learn the residual output of the previous network without learning the entire output, so ResNet is also called a residual network.
Insert picture description here

In other words, after using the residual module, small changes are highlighted! ! Solved the problem of gradient disappearing! So the key is actually this! ! !

Resnet implementation

import torch.nn as nn
import math
import torch.utils.model_zoo as model_zoo

def conv3x3(in_planes, out_planes, stride=1):
    
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=1, bias=False)

class BasicBlock(nn.Module):
    expansion = 1
    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out

class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes=1000):
        self.inplanes = 64
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AvgPool2d(7, stride=1)
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)

        return x

def resnet18(pretrained=False, **kwargs):
  trained (bool): If True, returns a model pre-trained on ImageNet
    
    model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['resnet18']))
    return model
Published 469 original articles · praised 329 · 600,000 views

Guess you like

Origin blog.csdn.net/qq_32146369/article/details/105361909