1. Development

In 1989, Yann LeCun proposed a convolutional neural network updated with backward propagation, called LeNet .

In 1998, Yann LeCun proposed a convolutional neural network that uses backward conduction for updating, called LeNet-5

AlexNet was the champion network in the ISLVRC 2012 (ImageNet Large Scale Visual Recognition Challenge) competition in 2012. The classification accuracy increased from the traditional 70%+ to 80%+. It was designed by Hinton and his student Alex Krizhevsky. It was also after that year that deep learning began to develop rapidly.

VGG was proposed by the famous research group VGG (Visual Geometry Group ) of Oxford University in 2014 , and won the first place in the LocalizationTask ( positioning task ) in the ImageNet competition that year . and second place in Cl a ssifi c ation T a sk ( classification task ) .

GoogleNet，ResNet

2. VGG

2.1 Detailed explanation

Convolution kernel stacking : achieve large convolution kernel effect through small convolution kernel stacking while reducing parameters

The stri de of c o n v is 1 and the padd i ng is 1

The si z e of m a x pool is 2 and the s tri de is 2

3. Receptive field

In a convolutional neural network, the area size of the input layer corresponding to an element in the output result of a certain layer is determined, which is called the receptive field. The popular explanation is that a unit on the output feature map corresponds to the size of the area on the input layer.

The superposition effect of three 3*3 convolutions can achieve a 7*7 effect, and the receptive field of the two 3*3 convolution superpositions is a 5*5 effect.

(The 5*5 convolution kernel requires 25 parameters, while the two 3*3 convolutions only require 18. The receptive field can achieve the same effect as 5*5)

4. VGG implementation

4.1 VGG compatible with all versions

from tensorflow import keras
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


将几种VGG同时实现
cfgs = {
    'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], 
    'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M']
}


def make_feature(cfg):
    feature_layers = []
    for v in cfg:
        if v == 'M':
            feature_layers.append(keras.layers.MaxPool2D(pool_size=2, strides=2))
        else:
            feature_layers.append(keras.layers.Conv2D(v, kernel_size=3, 
                                                      padding='SAME', 
                                                      activation='relu'))
    return keras.Sequential(feature_layers, name='feature')

4.2 Model implementation

# 定义vgg的网络结构
def VGG(feature, im_height=224, im_width=224, num_classes=1000):
    input_image = keras.layers.Input(shape=(im_height, im_width, 3), dtype='float32')
    x = feature(input_image)
    x = keras.layers.Flatten()(x)
    x = keras.layers.Dropout(rate=0.5)(x)
    # 原论文是4096
    x = keras.layers.Dense(2048, activation='relu')(x)
    x = keras.layers.Dropout(rate=0.5)(x)
    x = keras.layers.Dense(2048, activation='relu')(x)
    x = keras.layers.Dense(num_classes)(x)
    output = keras.layers.Softmax()(x)
    
    model = keras.models.Model(inputs=input_image, outputs=output)
    return model

4.3 Data preparation

train_dir = './training/training/'
valid_dir = './validation/validation/'

# 图片数据生成器
train_datagen = keras.preprocessing.image.ImageDataGenerator(
    rescale = 1. / 255,
    rotation_range = 40,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    vertical_flip = True,
    fill_mode = 'nearest'
)

height = 224
width = 224
channels = 3
batch_size = 32
num_classes = 10

train_generator = train_datagen.flow_from_directory(train_dir,
                                 target_size = (height, width),
                                 batch_size = batch_size,
                                 shuffle = True,
                                 seed = 7,
                                 class_mode = 'categorical')

valid_datagen = keras.preprocessing.image.ImageDataGenerator(
    rescale = 1. / 255
)
valid_generator = valid_datagen.flow_from_directory(valid_dir,
                                 target_size = (height, width),
                                 batch_size = batch_size,
                                 shuffle = True,
                                 seed = 7,
                                 class_mode = 'categorical')
print(train_generator.samples)
print(valid_generator.samples)

4.4 Get model

# 获取vgg模型
def vgg(model_name='vgg16', im_height=224, im_width=224, num_classes=1000):
    cfg = cfgs[model_name]
    model = VGG(make_feature(cfg), im_height=im_height, im_width=im_width, num_classes=num_classes)
    return model


vgg16 = vgg(num_classes=10)
vgg16.summary()

4.5 Training

vgg16.compile(optimizer='adam', 
              loss='categorical_crossentropy',
              metrics=['acc'])

history = vgg16.fit(train_generator,
                   steps_per_epoch=train_generator.samples // batch_size,
                   epochs=10,
                   validation_data=valid_generator,
                   validation_steps = valid_generator.samples // batch_size
                   )

VGG 07