Towards Depth | VGG (2)

Start with a picture, first throw out the network architecture of vgg11 (the full version is placed at the bottom of the article)

vgg11 (part)

Next, with the official code of pytorch, analyze vgg11. Taking vgg11 as the entry point, from the shallower to the deeper, understand the vgg architecture

Source code analysis

demo

import torchvision.models as models
vgg11 = models.vgg11(pretrained=True)
print(vgg11)

ctrl + left mouse button click on vgg11, enter the vgg.py file, what comes into view is the constructor of vgg11

vgg11 is divided into two types:

Without BN layer (parameter after 'A' is False)

def vgg11(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> VGG:
    r"""VGG 11-layer model (configuration "A") from
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg11', 'A', False, pretrained, progress, **kwargs)#A是论文中表1的A列,后边的False是不带BN层,如果是True则带BN层

With BN layer (parameter after 'A' is True)

def vgg11_bn(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> VGG:
    r"""VGG 11-layer model (configuration "A") with batch normalization
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg11_bn', 'A', True, pretrained, progress, **kwargs)

The constructor uses the _vgg() method to create the network

def _vgg(arch: str, cfg: str, batch_norm: bool, pretrained: bool, progress: bool, **kwargs: Any) -> VGG:
    if pretrained:#如果加载预训练模型,那么就不初始化权重
        kwargs['init_weights'] = False
    model = VGG(make_layers(cfgs[cfg], batch_norm=batch_norm), **kwargs)
    if pretrained:
        state_dict = load_state_dict_from_url(model_urls[arch],
                                              progress=progress)
        model.load_state_dict(state_dict)
    return model

Look at the first parameter of the VGG methodmake_layers(cfgs[cfg], batch_norm=batch_norm)

ctrl+left mouse button click on cfgs, jump to cfgs

cfgs: Dict[str, List[Union[str, int]]] = { #根据网络结构和参数设置,构造 cfgs 字典,存放这些结构参数
    'A': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],#数字代表卷积层输出特征图通道数,M 代表最大池化层
    'B': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'E': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}

Then go back to the previous level, ctrl + left mouse button clickmake_layers

This function is responsible for building the feature extraction network

def make_layers(cfg: List[Union[str, int]], batch_norm: bool = False) -> nn.Sequential:
    layers: List[nn.Module] = [] # 层列表初始化
    in_channels = 3 #RGB图像为3通道
    for v in cfg:
        if v == 'M': # 添加池化层 核大小和步长都为2
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:#添加卷积层
            v = cast(int, v)#检查v的type
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1) # 3×3 卷积
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
                layers += [conv2d, nn.ReLU(inplace=True)]
            in_channels = v#下一层的channel是上一层的out_channel
    return nn.Sequential(*layers)

Next, go back to the previous level

In the _vgg() method, the VGG() method is called to create the model

class VGG(nn.Module):

    def __init__(
        self,
        features: nn.Module,
        num_classes: int = 1000,
        init_weights: bool = True
    ) -> None:
        super(VGG, self).__init__()
        self.features = features #特征提取部分
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))## 自适应平均池化,特征图池化到 7×7 大小
        # 分类部分
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )
        if init_weights:# 权重初始化
            self._initialize_weights()

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # 特征提取
        x = self.features(x)
        # 自适应平均池化
        x = self.avgpool(x)
        # 展平
        x = torch.flatten(x, 1)
        # 分类
        x = self.classifier(x)
        return x
        ```
        pytorch官方实现做了一点改进,用的自适应平均池化,为的让下一步flatten操作接收固定向量,网络能够喂入不同大小的输入
        分类这块,pytorch官方加入了两个dropout操作
        ```

    def _initialize_weights(self) -> None:
        for m in self.modules():
            if isinstance(m, nn.Conv2d):# 卷积层使用 kaimming 初始化
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:# 偏置初始化为0
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):# BN层权重初始化为1
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):#全连接层权重初始化
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

After the analysis is completed, in general, vgg uses stacking small convolution kernels to achieve the same receptive field as large convolution kernels, while increasing the depth of the network, introducing multiple nonlinearities, and enhancing the fitting ability of the network. From the point of view of the amount of parameters, multiple stacked small convolution kernels reduce the amount of parameters. From the perspective of the size of the convolution kernel, stacking multiple small convolution kernels to replace the large convolution kernel is equivalent to introducing regularization to the large convolution kernel, which enhances the robust feature extraction of the network.

Finally, compare the complete network structure of vgg11 to understand the vgg architecture.

vgg11 (complete)

Guess you like

Origin blog.csdn.net/wl1780852311/article/details/123139645