Paper reading notes: ResNext

1. ResNext

Xie, Saining, et al. “Aggregated residual transformations for deep neural networks.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

The multi-branch structure was proposed in Inception, but the structure of each branch in Inception is different, so this brings a lot of trouble to the design of the network structure. As the work of the Inception-like model, the split-transform-merge strategy is also used in this ResNext article (1x1 convolution dimensionality reduction, then 3x3 or 5x5 convolution to further extract features, and finally splicing the features of each branch) , However, the author believes that using the same structure in multiple branches can greatly simplify the design of the network and the simple design can greatly reduce the structural risk (the author cited the example of VGG to illustrate this point, because the AlexNet design before VGG was too confusing ), therefore, the design of the neural network should follow simple and easy-to-understand principles, rather than stacking various complex modules.

First of all, the article proposes a new term: cardinality . This is a new hyperparameter referring to the number of paths to branches. And the author believes that increasing the number of paths is conducive to improving the accuracy of image classification. At the same time, it is different from blindly increasing the depth of the network. Increasing branches can be more efficient.

It is worth noting that the term cardinality is easily confused with width , which refers to the number of output channels of the middle convolutional layer in a bottleneck block ( bottleneck ). The usual bottleneck block always first reduces the channel dimension and then increases the dimension.
insert image description here
As shown in the figure above, the ResNet block is on the left and the ResNext block is on the right, where the cardinality value is 32, the width of each branch is 4, and a residual connection is also used at the end. The only difference between the two is the design of the branches.

insert image description here
The author also gave three equivalent structures. The number of channels output by each branch in (a) is 256, which can be added one by one at the end, but this method requires a large amount of calculation. Each branch in (b) only outputs Part of the final number of channels, and finally splicing them together, there is not much difference between (c) and (b), but the intermediate process is changed to the form of group convolution, which can greatly reduce the amount of parameters.

2. Code

import torch
import torch.nn as nn
import torchvision
from torch.utils import data
import matplotlib.pyplot as plt
import copy
import math

def conv1x1(in_channels, out_channels, stride=1, groups=1, bias=False):
    # 1x1卷积操作
    return nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
                     kernel_size=1, stride=stride, groups=groups, bias=bias)

def conv3x3(in_channels, out_channels, stride=1, padding=1, dilation=1, groups=1, bias=False):
    # 3x3卷积操作
    # 默认不是下采样
    return nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
                     kernel_size=3, stride=stride, padding=padding, dilation=dilation,
                     groups=groups,bias=bias)

class ResNextBottleneck(nn.Module):

    def __init__(self, in_channels, out_channels, stride, cardinality, bottleneck_width):
        # cardinality : 分支数
        # bottleneck : 每个分支的通道数
        super(ResNextBottleneck, self).__init__()
        mid_channels = in_channels // 4
        D = int(math.floor(mid_channels * (bottleneck_width / 64.0)))
        group_width = cardinality * D

        self.conv1 = conv1x1(
            in_channels=in_channels,
            out_channels=group_width
        )
        self.conv2 = conv3x3(
            in_channels=group_width,
            out_channels=group_width,
            stride=stride,
            groups=cardinality
        )
        self.conv3 = conv1x1(
            in_channels=group_width,
            out_channels=out_channels
        )

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        return x

class ResNextUnit(nn.Module):

    def __init__(self, in_channels, out_channels, stride, cardinality, bottleneck_width):
        super(ResNextUnit, self).__init__()
        self.resize_identity = (in_channels != out_channels) or (stride != 1)  # 残差连接是否改变形状

        self.body = ResNextBottleneck(
            in_channels = in_channels,
            out_channels = out_channels,
            stride = stride,
            cardinality = cardinality,
            bottleneck_width = bottleneck_width
        )

        if self.resize_identity:
            self.identity_conv = conv1x1(
                in_channels=in_channels,
                out_channels=out_channels,
                stride=stride
            )
        self.activ = nn.ReLU(inplace=True)

    def forward(self, x):
        if self.resize_identity:
            identity = self.identity_conv(x)
        else:
            identity = x
        x = self.body(x)
        x = x + identity
        return self.activ(x)