Detailed explanation of VGG network structure based on CIFAR100

Detailed explanation of VGG network based on CIFAR100

Code words are not easy, like and collect


1 Dataset overview

1.1 CIFAR100

cifar100 contains 20 categories, a total of 100 categories, 50,000 pictures in the train set, and 10,000 pictures in the test set.
insert image description here
CIFAR100 download address:http://www.cs.toronto.edu/~kriz/cifar.html

1.2 showdata.py view data

import cv2
import numpy as np
import pickle
import os


# 解压缩,返回解压后的字典
def unpickle(file):
    fo = open(file, 'rb')
    dict = pickle.load(fo, encoding='latin1')
    fo.close()
    return dict


def cifar100_to_images():
    tar_dir = './data/cifar-100-python/'  # 原始数据库目录
    train_root_dir = './data/cifar100/train/'  # 图片保存目录
    test_root_dir = './data/cifar100/test/'
    if not os.path.exists(train_root_dir):
        os.makedirs(train_root_dir)
    if not os.path.exists(test_root_dir):
        os.makedirs(test_root_dir)

    # 获取label对应的class,分为20个coarse class,共100个 fine class
    meta_Name = tar_dir + "meta"
    Meta_dic = unpickle(meta_Name)
    coarse_label_names = Meta_dic['coarse_label_names']
    fine_label_names = Meta_dic['fine_label_names']
    print(fine_label_names)

    # 生成训练集图片,如果需要png格式,只需要改图片后缀名即可。
    dataName = tar_dir + "train"
    Xtr = unpickle(dataName)
    print(dataName + " is loading...")
    for i in range(0, Xtr['data'].shape[0]):
        image = np.reshape(Xtr['data'][i], (-1,1024))  # Xtr['data']为图片二进制数据
        r = image[0, :].reshape(32, 32)  # 红色分量
        g = image[1, :].reshape(32, 32)  # 绿色分量
        b = image[2, :].reshape(32, 32)  # 蓝色分量
        img = np.zeros((32, 32, 3))
        # RGB还原成彩色图像
        img[:, :, 0] = r
        img[:, :, 1] = g
        img[:, :, 2] = b
        ###img_name:fine_label+coarse_label+fine_class+coarse_class+index
        picName = train_root_dir + str(Xtr['fine_labels'][i]) + '_' + str(Xtr['coarse_labels'][i]) + '_&' + \
                  fine_label_names[Xtr['fine_labels'][i]] + '&_' + coarse_label_names[
                      Xtr['coarse_labels'][i]] + '_' + str(i) + '.jpg'
        cv2.imwrite(picName, img)
    print(dataName + " loaded.")

    print("test_batch is loading...")
    # 生成测试集图片
    testXtr = unpickle(tar_dir + "test")
    for i in range(0, testXtr['data'].shape[0]):
        img = np.reshape(testXtr['data'][i], (3, 32, 32))
        img = img.transpose(1, 2, 0)
        picName = test_root_dir + str(testXtr['fine_labels'][i]) + '_' + str(testXtr['coarse_labels'][i]) + '_&' + \
                  fine_label_names[testXtr['fine_labels'][i]] + '&_' + coarse_label_names[
                      testXtr['coarse_labels'][i]] + '_' + str(i) + '.jpg'
        cv2.imwrite(picName, img)
    print("test_batch loaded.")

if __name__ == '__main__':
    cifar100_to_images()

insert image description here

2 VGG network structure

2.1 Overview of network structure

In the VGG network, the convolution kernel size is 3x3 (padding=1), that isThe convolution operation will not change the size of the feature map
The only thing that changes the size of the feature map ispooling operation( The feature map size changes from (H, W) to (H/2, W/2) )

After clarifying the above two points, the VGG network is very clear and easy to understand. For example, the network structure of VGG16 in the figure below, the size of the input image is 224x224x3, VGG16 contains 5 pooling operations, so the size of the feature map before flattening should be 7 (224/32);
as for the change of the channel, it is simpler , only the convolution operation will bring about a channel change, and the channel change in the convolution operation is achieved through different convolution kernel groups

insert image description here

2.2 VGG network structure source code

class VGG(nn.Module):

    def __init__(self, features, num_class=100):
        super().__init__()
        self.features = features

        self.classifier = nn.Sequential(
            nn.Linear(512, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, num_class)
        )

    def forward(self, x):
        output = self.features(x)
        output = output.view(output.size()[0], -1)
        output = self.classifier(output)

        return output

The VGG network structure based on CIFAR100 is defined very concisely, which can be said to be clear at a glance.

对于网络结构定义,从forward函数可以看出主要包括两部分:features和classifier
features:卷积层+池化层,只涉及对网络尺寸、网络通道的改变,即VGG中全连接层之前的所有操作
classifier:全连接+分类,把从features中得到的特征图扁平化后,经过3层全连接后将类别映射到100进行分类预测

2.3 Features construction process in VGG

cfg = {
    
    
    'A' : [64,     'M', 128,      'M', 256, 256,           'M', 512, 512,           'M', 512, 512,           'M'],# vgg11
    'B' : [64, 64, 'M', 128, 128, 'M', 256, 256,           'M', 512, 512,           'M', 512, 512,           'M'],# vgg13
    'D' : [64, 64, 'M', 128, 128, 'M', 256, 256, 256,      'M', 512, 512, 512,      'M', 512, 512, 512,      'M'],# vgg16
    'E' : [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'] # vgg19
}

def make_layers(cfg, batch_norm=False):
    layers = []

    input_channel = 3
    for l in cfg:
        if l == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
            continue

        layers += [nn.Conv2d(input_channel, l, kernel_size=3, padding=1)]

        if batch_norm:
            layers += [nn.BatchNorm2d(l)]

        layers += [nn.ReLU(inplace=True)]
        input_channel = l

    return nn.Sequential(*layers)

def vgg16_bn():
    return VGG(make_layers(cfg['D'], batch_norm=True))

2.3.1 Dictionary cfg

First, VGG represents the network structure of the features part using a dictionary, of whichThe number indicates the change of the number of channelsThe letter M indicates the maxpooling operation

2.3.2 make_layers+vgg16_bn

The make_layer part is to construct the network according to the VGG network structure given in cfg. Take vgg16_bn as an example:
1) Determine whether the current layer isConvolution or pooling operation: If it is a convolution operation, connect the network according to the number of channels given in cfg; if it is a pooling operation, perform the maximum pooling of the feature map with a pooling kernel of size 2, and the current feature map size is halved .
2) For each batchBatch for normalizationoperate.
3) Finally, perform the current layeractivate operation

在CIFAR100中,输入图片尺寸为3*32*32,经过features后特征图尺寸变为512*1*1,扁平化后只剩下通道维度的尺寸512,并且512也是分类器的输入

In fact, this make_layer method is very common in the source code, and the ResNet source code with just over 30,000 references is also built in this way

2.4 Classifier

512->4096->4096->100

Three layers of fully connected layers, and softmax before output to form the final classifier

2.5 Network Complete Source Code

import torch.nn as nn

cfg = {
    
    
    'A' : [64,     'M', 128,      'M', 256, 256,           'M', 512, 512,           'M', 512, 512,           'M'],
    'B' : [64, 64, 'M', 128, 128, 'M', 256, 256,           'M', 512, 512,           'M', 512, 512,           'M'],
    'D' : [64, 64, 'M', 128, 128, 'M', 256, 256, 256,      'M', 512, 512, 512,      'M', 512, 512, 512,      'M'],
    'E' : [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M']
}

class VGG(nn.Module):

    def __init__(self, features, num_class=100):
        super().__init__()
        self.features = features

        self.classifier = nn.Sequential(
            nn.Linear(512, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, num_class)
        )

    def forward(self, x):
        output = self.features(x)
        output = output.view(output.size()[0], -1)
        output = self.classifier(output)

        return output

def make_layers(cfg, batch_norm=False):
    layers = []

    input_channel = 3
    for l in cfg:
        if l == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
            continue

        layers += [nn.Conv2d(input_channel, l, kernel_size=3, padding=1)]

        if batch_norm:
            layers += [nn.BatchNorm2d(l)]

        layers += [nn.ReLU(inplace=True)]
        input_channel = l

    return nn.Sequential(*layers)

def vgg11_bn():
    return VGG(make_layers(cfg['A'], batch_norm=True))

def vgg13_bn():
    return VGG(make_layers(cfg['B'], batch_norm=True))

def vgg16_bn():
    return VGG(make_layers(cfg['D'], batch_norm=True))

def vgg19_bn():
    return VGG(make_layers(cfg['E'], batch_norm=True))

3 Post a poem that often echoes in your mind recently

Sleepwalking Tianmu sings farewell
Li Bai
Haike talks about Yingzhou, the misty waves are faint, and letters are hard to find;
the more people talk about Tianmu, the clouds can be seen or faded.
Tianmu stretches toward the sky, pulls out the five mountains to cover Chicheng.
The rooftop is 48,000 feet long, and it is about to fall to the southeast.
I want to fly to the mirror lake and moon overnight because of my dream Wuyue.
The lake moon shines on my shadow and sends me to Shanxi.
Xie Gong's residence is still there today, and the Lushui is rippling and the apes are singing.
Wear Xie Gong clogs and climb the Qingyun Ladder.
Seeing the sun on the half wall, smelling chickens in the sky.
Thousands of rocks turn around and the road is uncertain, and the lost flowers lean on the stone and it is suddenly dark.
The bear roars and the dragon chants Yin Yanquan, and the deep chestnut forest surprises the top of the layer.
The clouds are green and green, and there is rain, and the water is dull, and there is smoke.
There is a lack of thunderbolt, and the hills and mountains collapse.
The stone door in the cave opened in a loud voice.
The Qingming mighty bottomless, the sun and the moon shine on Jinyintai.
Ni is the clothes, the wind is the horse, and the king of clouds is coming and going.
The tiger drums and the luan return to the carriage, and the immortals are lined up like hemp.
Suddenly the soul palpitates with the soul, suddenly starts and sighs.
I only feel the pillow mat of time, and lose the haze that came before.
The same is true for worldly pleasures. Since ancient times, everything has flowed eastward.
Don't go, when will you return? And put the white deer among the green cliffs, and ride to visit famous mountains as soon as you have to go.
An Neng smashes eyebrows and bends waist to serve the rich and powerful, which makes me unhappy!

OVER

Guess you like

Origin blog.csdn.net/weixin_43427721/article/details/122122427