Start with a picture, first throw out the network architecture of vgg11 (the full version is placed at the bottom of the article)
Next, with the official code of pytorch, analyze vgg11. Taking vgg11 as the entry point, from the shallower to the deeper, understand the vgg architecture
Source code analysis
demo
import torchvision.models as models
vgg11 = models.vgg11(pretrained=True)
print(vgg11)
ctrl + left mouse button click on vgg11, enter the vgg.py file, what comes into view is the constructor of vgg11
vgg11 is divided into two types:
Without BN layer (parameter after 'A' is False)
def vgg11(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> VGG:
r"""VGG 11-layer model (configuration "A") from
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`_.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _vgg('vgg11', 'A', False, pretrained, progress, **kwargs)#A是论文中表1的A列,后边的False是不带BN层,如果是True则带BN层
With BN layer (parameter after 'A' is True)
def vgg11_bn(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> VGG:
r"""VGG 11-layer model (configuration "A") with batch normalization
`"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`_.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
progress (bool): If True, displays a progress bar of the download to stderr
"""
return _vgg('vgg11_bn', 'A', True, pretrained, progress, **kwargs)
The constructor uses the _vgg() method to create the network
def _vgg(arch: str, cfg: str, batch_norm: bool, pretrained: bool, progress: bool, **kwargs: Any) -> VGG:
if pretrained:#如果加载预训练模型,那么就不初始化权重
kwargs['init_weights'] = False
model = VGG(make_layers(cfgs[cfg], batch_norm=batch_norm), **kwargs)
if pretrained:
state_dict = load_state_dict_from_url(model_urls[arch],
progress=progress)
model.load_state_dict(state_dict)
return model
Look at the first parameter of the VGG methodmake_layers(cfgs[cfg], batch_norm=batch_norm)
ctrl+left mouse button click on cfgs, jump to cfgs
cfgs: Dict[str, List[Union[str, int]]] = { #根据网络结构和参数设置,构造 cfgs 字典,存放这些结构参数
'A': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],#数字代表卷积层输出特征图通道数,M 代表最大池化层
'B': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
'E': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}
Then go back to the previous level, ctrl + left mouse button clickmake_layers
This function is responsible for building the feature extraction network
def make_layers(cfg: List[Union[str, int]], batch_norm: bool = False) -> nn.Sequential:
layers: List[nn.Module] = [] # 层列表初始化
in_channels = 3 #RGB图像为3通道
for v in cfg:
if v == 'M': # 添加池化层 核大小和步长都为2
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
else:#添加卷积层
v = cast(int, v)#检查v的type
conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1) # 3×3 卷积
if batch_norm:
layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
else:
layers += [conv2d, nn.ReLU(inplace=True)]
in_channels = v#下一层的channel是上一层的out_channel
return nn.Sequential(*layers)
Next, go back to the previous level
In the _vgg() method, the VGG() method is called to create the model
class VGG(nn.Module):
def __init__(
self,
features: nn.Module,
num_classes: int = 1000,
init_weights: bool = True
) -> None:
super(VGG, self).__init__()
self.features = features #特征提取部分
self.avgpool = nn.AdaptiveAvgPool2d((7, 7))## 自适应平均池化,特征图池化到 7×7 大小
# 分类部分
self.classifier = nn.Sequential(
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(4096, num_classes),
)
if init_weights:# 权重初始化
self._initialize_weights()
def forward(self, x: torch.Tensor) -> torch.Tensor:
# 特征提取
x = self.features(x)
# 自适应平均池化
x = self.avgpool(x)
# 展平
x = torch.flatten(x, 1)
# 分类
x = self.classifier(x)
return x
```
pytorch官方实现做了一点改进,用的自适应平均池化,为的让下一步flatten操作接收固定向量,网络能够喂入不同大小的输入
分类这块,pytorch官方加入了两个dropout操作
```
def _initialize_weights(self) -> None:
for m in self.modules():
if isinstance(m, nn.Conv2d):# 卷积层使用 kaimming 初始化
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:# 偏置初始化为0
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d):# BN层权重初始化为1
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):#全连接层权重初始化
nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(m.bias, 0)
After the analysis is completed, in general, vgg uses stacking small convolution kernels to achieve the same receptive field as large convolution kernels, while increasing the depth of the network, introducing multiple nonlinearities, and enhancing the fitting ability of the network. From the point of view of the amount of parameters, multiple stacked small convolution kernels reduce the amount of parameters. From the perspective of the size of the convolution kernel, stacking multiple small convolution kernels to replace the large convolution kernel is equivalent to introducing regularization to the large convolution kernel, which enhances the robust feature extraction of the network.
Finally, compare the complete network structure of vgg11 to understand the vgg architecture.