Paper reading notes: SPPNet

1. SPPNet

He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916.

The existing convolutional neural network always needs a picture of a specific size as input, such as the commonly used 224 × 224 224 \times 224224×2 2 4 . Assuming that there are some original pictures that do not meet this size, some preprocessing of the pictures is required, such as cropping and stretching. This series of manual processing will affect the prediction accuracy of the neural network. Therefore, in order to solve this problem and enable the neural network model to accept pictures of any input size, this paper proposes Spatial pyramidpooling

Why does the CNN model require an input of a specific size? This comes from the final linear classifier of the model. The classifier needs to perform a flatten operation on the feature map processed by the CNN . This requires knowing the shape and number of channels of the feature map output by the final CNN . In fact, it can be done through global pooling, because the shape of the feature map after global pooling is C × 1 × 1 C \times 1 \times 1C×1×1 . However, global pooling will lose certain precision, which is equivalent to using a kernel with the same size as the feature map to make a max pool or avg pool.

In order to preserve the accuracy of the feature map during pooling, the SPPNet proposed in this paper adds multi-scale pooling before the classifier, and then flattens and stitches the results after multi-scale pooling, and finally obtains a fixed-size feature vector.
insert image description here
Suppose the image size of the last convolution output is C × H × WC \times H \times WC×H×W , we use 16 times, 4 times, 1 times the scale for sampling, and finally we can get( 16 + 4 + 1 ) times C (16 + 4 + 1) \ times C(16+4+1 )  The eigenvectors of t i m e s C.
insert image description here

2. Code implementation

class SPPNet(nn.Module):

    def __init__(self, in_channels, levels=None):
        super(SPPNet, self).__init__()
        if levels is None:
            self.levels = [6, 3, 2, 1]
        else:
            self.levels = levels

    def forward(self, x):
        # x [batch_size, C, H, W]
        H, W = x.shape[2], x.shape[3]
        ret = []
        for i in range(len(self.levels)):

            h_kernel = int(math.ceil(H / self.levels[i]))
            w_kernel = int(math.ceil(W / self.levels[i]))
            h_pad = int(math.ceil((h_kernel * self.levels[i] - H) / 2))
            w_pad = int(math.ceil((w_kernel * self.levels[i] - W) / 2))
            maxpool = nn.MaxPool2d(kernel_size=(h_kernel, w_kernel),
                                   stride=(h_kernel, w_kernel),
                                   padding=(h_pad, w_pad))
            ret.append(torch.flatten(maxpool(x), start_dim=2))
        return torch.flatten(torch.cat(ret, dim=-1), start_dim=1)

We embed this module into our custom convolutional model:

class ConvNet(nn.Module):

    def __init__(self, num_classes=10, levels=None):
        super(ConvNet, self).__init__()
        if levels is None:
            levels = [6, 3, 2, 1]
        classifier_in = torch.sum(torch.tensor(levels) ** 2)
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=64, kernel_size=3,
                               stride=2, padding=1)
        self.bn1 = nn.BatchNorm2d(num_features=64)
        self.relu1 = nn.ReLU(inplace=True)

        self.conv2 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3,
                               stride=2, padding=1)
        self.bn2 = nn.BatchNorm2d(num_features=128)
        self.relu2 = nn.ReLU(inplace=True)

        self.conv3 = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3,
                               stride=2, padding=1)
        self.bn3 = nn.BatchNorm2d(num_features=256)
        self.relu3 = nn.ReLU(inplace=True)

        self.conv4 = nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3,
                               stride=1, padding=1)
        self.bn4 = nn.BatchNorm2d(num_features=512)
        self.relu4 = nn.ReLU(inplace=True)

        self.spp = SPPNet(in_channels=512, levels=levels)
        self.relu5 = nn.ReLU(inplace=True)

        self.classifier = nn.Linear(in_features=classifier_in * 512, out_features=num_classes)
        self._init_params()

    def _init_params(self):
        for name, module in self.named_modules():
            if isinstance(module, nn.Conv2d):
                nn.init.kaiming_normal_(module.weight)

    def forward(self, x):
        x = self.relu1(self.bn1(self.conv1(x)))
        x = self.relu2(self.bn2(self.conv2(x)))
        x = self.relu3(self.bn3(self.conv3(x)))
        x = self.relu4(self.bn4(self.conv4(x)))
        x = self.relu5(self.spp(x))
        x = self.classifier(x)
        return x

Guess you like

Origin blog.csdn.net/loki2018/article/details/125296334