Neural network learning short record 55 - Keras builds a common classification network platform (VGG16, MobileNet, ResNet50)

Preface to study
Source code download
Common forms of classification networks
Introduction to classification networks
Training of classification network

Preface to study

I discovered that I have done so many blogs and videos, but I have never systematically made a classification network. Building a scientific classification network is good for your health.
Insert image description here

Source code download

https://github.com/bubbliiiiing/classification-keras
If you like it, you can give it a star.

Common forms of classification networks

Common classification networks can be divided into two parts, one is the feature extraction part and the other is the classification part. Insert image description here
The function of the feature extraction part is to extract features from the input image , Excellent features can help distinguish targets more easily, so feature extraction The part is generally composed of various types of convolutions, which have powerful feature extraction capabilities;

The classification part will use the features obtained by thefeature extraction part forclassification< a i=4>, the classification part is generally composed of fully connected, The features obtained by the feature extraction part are generally one-dimensional vectors, which can be directly used for fully connected classification .
Insert image description here
Usually, the feature extraction part is the various neural networks we usually know, such as VGG, Mobilenet, Resnet, etc.; and the classification part is one or several full connections. Finally, we A one-dimensional vector of length num_classes will be obtained.

Introduction to classification networks

1. Introduction to VGG16 network

VGG is a convolutional neural network model proposed by Simonyan and Zisserman in the document "Very Deep Convolutional Networks for Large Scale Image Recognition". Its name comes from the Visual Geometry Group of the University of Oxford where the author works. abbreviation.
This model participated in the 2014 ImageNet Image Classification and Positioning Challenge and achieved excellent results: ranking second in the classification task and first in the positioning task.
Its structure is shown in the figure below:
Insert image description here
This is a picture of VGG16 being used badly, but it does reflect the structure of VGG16 very well. The entire VGG16 It consists of three different layers, namely convolutional layer, maximum pooling layer, and fully connected layer.
The specific execution method of VGG16 is as follows:
1. An original image is resized to (224,224,3).
2. conv1: Perform two [3,3] convolutional networks, the output feature layer is 64, the output is (224,224,64), and then perform 2X2 maximum pooling, the output net is (112,112,64).
3. conv2: Perform two [3,3] convolutional networks, the output feature layer is 128, the output net is (112,112,128), and then perform 2X2 maximum pooling, the output net is ( 56,56,128).
4. conv3: Perform three times [3,3] convolution network, the output feature layer is 256, the output net is (56,56,256), and then perform 2X2 maximum pooling, the output net is (28,28,256).
5. conv4: Perform three times [3,3] convolution network, the output feature layer is 512, the output net is (28,28,512), and then perform 2X2 maximum pooling, the output net is (14,14,512).
6. conv5: Perform three times [3,3] convolution network, the output feature layer is 512, the output net is (14,14,512), and then perform 2X2 maximum pooling, the output net is (7,7,512).
7. Tile the results.
8. Perform two fully connected layers with 4096 neurons.
8. Fully connected to 1000 dimensions for classification.
The final output is the prediction of each class.

The implementation code is as follows:

import warnings
from keras.models import Model
from keras.layers import Input,Activation,Dropout,Reshape,Conv2D,MaxPooling2D,Dense,Flatten
from keras import backend as K

def VGG16(input_shape=None, classes=1000):
    img_input = Input(shape=input_shape)

    # Block 1
    x = Conv2D(64, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block1_conv1')(img_input)
    x = Conv2D(64, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block1_conv2')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)

    # Block 2
    x = Conv2D(128, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block2_conv1')(x)
    x = Conv2D(128, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block2_conv2')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)

    # Block 3
    x = Conv2D(256, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block3_conv1')(x)
    x = Conv2D(256, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block3_conv2')(x)
    x = Conv2D(256, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block3_conv3')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)

    # Block 4
    x = Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block4_conv1')(x)
    x = Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block4_conv2')(x)
    x = Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block4_conv3')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)

    # Block 5
    x = Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block5_conv1')(x)
    x = Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block5_conv2')(x)
    x = Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block5_conv3')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)

    x = Flatten(name='flatten')(x)
    x = Dense(4096, activation='relu', name='fc1')(x)
    x = Dense(4096, activation='relu', name='fc2')(x)
    x = Dense(classes, activation='softmax', name='predictions')(x)

    inputs = img_input

    model = Model(inputs, x, name='vgg16')
    return model

2. Introduction to MobilenetV1 network

The MobilenetV1 model is a lightweight deep neural network proposed by Google for embedded devices such as mobile phones. The core idea it uses is depthwise separable convolution (depthwise separable convolution block).

The depth-separable convolution block consists of two parts, namely the depth-separable convolution and the 1x1 ordinary convolution. The convolution kernel size of the depth-separable convolution is generally 3x3. For ease of understanding, we can regard it as For feature extraction, 1x1 ordinary convolution can adjust the number of channels.

The following figure is a schematic structural diagram of the depthwise separable convolution block:
Insert image description here
The purpose of the depthwise separable convolution block is to use fewer parameters to replace ordinary 3x3 convolution.

We can compare ordinary convolution and depth-separable convolution blocks:

For ordinary convolution,Assume there is a 3×3 convolution layer with 16 input channels and 32 output channels. Specifically, 32 convolution kernels of 3×3 size will traverse each data in the 16 channels, and finally the required 32 output channels can be obtained, and the required parameters are 16×32×3×3=4608.

For the depthwise separable convolution structure block,Assume there is a depthwise separable convolution structure block with an input channel of 16 and an output channel of 32. It will use The 16 3×3 convolution kernels traverse the 16-channel data respectively and obtain 16 feature maps. Before the fusion operation, 32 convolution kernels of 1×1 size are then used to traverse the 16 feature maps, and the required parameters are 16×3×3+16×32×1×1=656.

It can be seen that the depth-separable convolution structure block can reduce the parameters of the model.

The code for depthwise separable convolution is as follows:

def _depthwise_conv_block(inputs, pointwise_conv_filters, alpha,
                          depth_multiplier=1, strides=(1, 1), block_id=1):

    pointwise_conv_filters = int(pointwise_conv_filters * alpha)

    x = DepthwiseConv2D((3, 3),
                        padding='same',
                        depth_multiplier=depth_multiplier,
                        strides=strides,
                        use_bias=False,
                        name='conv_dw_%d' % block_id)(inputs)

    x = BatchNormalization(name='conv_dw_%d_bn' % block_id)(x)
    x = Activation(relu6, name='conv_dw_%d_relu' % block_id)(x)

    x = Conv2D(pointwise_conv_filters, (1, 1),
               padding='same',
               use_bias=False,
               strides=(1, 1),
               name='conv_pw_%d' % block_id)(x)
    x = BatchNormalization(name='conv_pw_%d_bn' % block_id)(x)
    return Activation(relu6, name='conv_pw_%d_relu' % block_id)(x)

A popular understanding of the depth-separable convolution structure block is that the 3x3 convolution kernel is only one layer thick, and then slides layer by layer on the input tensor. Each convolution generates an output channel. When the convolution is completed, , using 1x1 convolution to adjust the thickness.

The following is the structure of MobileNet, where Conv dw is a depth-separable convolution, followed by a 1x1 convolution for channel processing,
Insert image description here
is completed in the feature extraction part After extracting the features of the input image, we will use global average pooling to adjust the feature layer into a feature strip. We can fully connect the feature strip to obtain the final classification result.

The implementation code is as follows:

import warnings
from keras.models import Model
from keras.layers import DepthwiseConv2D,Input,Activation,Dropout,Reshape,BatchNormalization,GlobalAveragePooling2D,GlobalMaxPooling2D,Conv2D
from keras import backend as K

def _conv_block(inputs, filters, alpha, kernel=(3, 3), strides=(1, 1)):
    filters = int(filters * alpha)

    x = Conv2D(filters, kernel,
                      padding='same',
                      use_bias=False,
                      strides=strides,
                      name='conv1')(inputs)
    x = BatchNormalization(name='conv1_bn')(x)
    return Activation(relu6, name='conv1_relu')(x)


def _depthwise_conv_block(inputs, pointwise_conv_filters, alpha,
                          depth_multiplier=1, strides=(1, 1), block_id=1):

    pointwise_conv_filters = int(pointwise_conv_filters * alpha)

    x = DepthwiseConv2D((3, 3),
                        padding='same',
                        depth_multiplier=depth_multiplier,
                        strides=strides,
                        use_bias=False,
                        name='conv_dw_%d' % block_id)(inputs)

    x = BatchNormalization(name='conv_dw_%d_bn' % block_id)(x)
    x = Activation(relu6, name='conv_dw_%d_relu' % block_id)(x)

    x = Conv2D(pointwise_conv_filters, (1, 1),
               padding='same',
               use_bias=False,
               strides=(1, 1),
               name='conv_pw_%d' % block_id)(x)
    x = BatchNormalization(name='conv_pw_%d_bn' % block_id)(x)
    return Activation(relu6, name='conv_pw_%d_relu' % block_id)(x)

def MobileNet(input_shape=None,
              alpha=1.0,
              depth_multiplier=1,
              dropout=1e-3,
              classes=1000):

    img_input = Input(shape=input_shape)

    # 224,224,3 -> 112,112,32  
    x = _conv_block(img_input, 32, alpha, strides=(2, 2))
    # 112,112,32 -> 112,112,64
    x = _depthwise_conv_block(x, 64, alpha, depth_multiplier, block_id=1)


    # 112,112,64 -> 56,56,128
    x = _depthwise_conv_block(x, 128, alpha, depth_multiplier,
                              strides=(2, 2), block_id=2)
    x = _depthwise_conv_block(x, 128, alpha, depth_multiplier, block_id=3)


    # 56,56,128 -> 28,28,256
    x = _depthwise_conv_block(x, 256, alpha, depth_multiplier,
                              strides=(2, 2), block_id=4)
    x = _depthwise_conv_block(x, 256, alpha, depth_multiplier, block_id=5)
    

    # 28,28,256 -> 14,14,512
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
                              strides=(2, 2), block_id=6)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=7)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=8)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=9)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=10)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=11)

    # 14,14,512 -> 7,7,1024
    x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier,
                              strides=(2, 2), block_id=12)
    x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier, block_id=13)

    # 7,7,1024 -> 1,1,1024
    x = GlobalAveragePooling2D()(x)

    shape = (1, 1, int(1024 * alpha))

    x = Reshape(shape, name='reshape_1')(x)
    x = Dropout(dropout, name='dropout')(x)

    x = Conv2D(classes, (1, 1),padding='same', name='conv_preds')(x)
    x = Activation('softmax', name='act_softmax')(x)
    x = Reshape((classes,), name='reshape_2')(x)

    inputs = img_input

    model = Model(inputs, x, name='mobilenet_%0.2f' % (alpha))
    return model

def relu6(x):
    return K.relu(x, max_value=6)

if __name__ == '__main__':
    model = MobileNet(input_shape=(224, 224, 3))
    model.summary()

3. Introduction to ResNet50 network

a. What is a residual network?

Residual net (residual network):
Output the data of a certain layer of the first several layers directlySkip multiple layers and introduce the input part of the subsequent data layer.
means that part of the content of the feature layer behind will be linearly contributed by a certain layer before it.
Its structure is as follows:
Insert image description here
The design of the deep residual network is to overcome the problems of low learning efficiency and inability to effectively improve the accuracy due to the deepening of the network depth. .

b. What is ResNet50 model?

ResNet50 has two basic blocks, named Conv Block and Identity Block. The input and output dimensions of Conv Block are different, so they cannot be connected in series. The function is to change the dimension of the network; the input dimension and output dimension of the Identity Block are the same and can be connected in series to deepen the network.
The structure of Conv Block is as follows. As can be seen from the figure, Conv Block can be divided into two parts. The left part is the trunk part, and there are two convolutions. , standardization, activation function and one-time convolution and standardization; the right part is the residual edge part, and there is one-time convolution and normalization. Since there is convolution in the residual edge part, we can use Conv Block changes the width, height and number of channels of the output feature layer:
Insert image description here
The implementation code is as follows:

def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2)):
    filters1, filters2, filters3 = filters

    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    # 降维
    x = Conv2D(filters1, (1, 1), strides=strides,
               name=conv_name_base + '2a')(input_tensor)
    x = BatchNormalization(name=bn_name_base + '2a')(x)
    x = Activation('relu')(x)

    # 3x3卷积
    x = Conv2D(filters2, kernel_size, padding='same',
               name=conv_name_base + '2b')(x)
    x = BatchNormalization(name=bn_name_base + '2b')(x)
    x = Activation('relu')(x)

    # 升维
    x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c')(x)
    x = BatchNormalization(name=bn_name_base + '2c')(x)

    # 残差边
    shortcut = Conv2D(filters3, (1, 1), strides=strides,
                      name=conv_name_base + '1')(input_tensor)
    shortcut = BatchNormalization(name=bn_name_base + '1')(shortcut)

    x = layers.add([x, shortcut])
    x = Activation('relu')(x)
    return x

The structure of the Identity Block is as follows. As can be seen from the figure, the Identity Block can be divided into two parts. The left part is the trunk part, with two convolutions, normalization and activation. Function and primary convolution, standardization; the right part is the residual edge part, which is directly connected to the output. Since there is no convolution in the residual edge part, the input feature layer and output of the Identity Block The shape of the feature layer is the same and can be used to deepen the network:
Insert image description here
The implementation code is as follows:

def identity_block(input_tensor, kernel_size, filters, stage, block):
    filters1, filters2, filters3 = filters

    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    # 降维
    x = Conv2D(filters1, (1, 1), name=conv_name_base + '2a')(input_tensor)
    x = BatchNormalization(name=bn_name_base + '2a')(x)
    x = Activation('relu')(x)
    # 3x3卷积
    x = Conv2D(filters2, kernel_size,padding='same', name=conv_name_base + '2b')(x)
    x = BatchNormalization(name=bn_name_base + '2b')(x)
    x = Activation('relu')(x)
    # 升维
    x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c')(x)
    x = BatchNormalization(name=bn_name_base + '2c')(x)

    x = layers.add([x, input_tensor])
    x = Activation('relu')(x)
    return x

Conv Block and Identity Block are both residual network structures.

The overall network structure is as follows:
Insert image description here
The implementation code is as follows:

from __future__ import print_function

import numpy as np
from keras import layers
from keras.layers import Input
from keras.layers import Dense,Conv2D,MaxPooling2D,ZeroPadding2D,AveragePooling2D
from keras.layers import Activation,BatchNormalization,Flatten
from keras.models import Model

def identity_block(input_tensor, kernel_size, filters, stage, block):
    filters1, filters2, filters3 = filters

    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    # 降维
    x = Conv2D(filters1, (1, 1), name=conv_name_base + '2a')(input_tensor)
    x = BatchNormalization(name=bn_name_base + '2a')(x)
    x = Activation('relu')(x)
    # 3x3卷积
    x = Conv2D(filters2, kernel_size,padding='same', name=conv_name_base + '2b')(x)
    x = BatchNormalization(name=bn_name_base + '2b')(x)
    x = Activation('relu')(x)
    # 升维
    x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c')(x)
    x = BatchNormalization(name=bn_name_base + '2c')(x)

    x = layers.add([x, input_tensor])
    x = Activation('relu')(x)
    return x


def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2)):
    filters1, filters2, filters3 = filters

    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    # 降维
    x = Conv2D(filters1, (1, 1), strides=strides,
               name=conv_name_base + '2a')(input_tensor)
    x = BatchNormalization(name=bn_name_base + '2a')(x)
    x = Activation('relu')(x)

    # 3x3卷积
    x = Conv2D(filters2, kernel_size, padding='same',
               name=conv_name_base + '2b')(x)
    x = BatchNormalization(name=bn_name_base + '2b')(x)
    x = Activation('relu')(x)

    # 升维
    x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c')(x)
    x = BatchNormalization(name=bn_name_base + '2c')(x)

    # 残差边
    shortcut = Conv2D(filters3, (1, 1), strides=strides,
                      name=conv_name_base + '1')(input_tensor)
    shortcut = BatchNormalization(name=bn_name_base + '1')(shortcut)

    x = layers.add([x, shortcut])
    x = Activation('relu')(x)
    return x


def ResNet50(input_shape=[224,224,3], classes=1000):
    # 224,224,3
    img_input = Input(shape=input_shape)
    x = ZeroPadding2D((3, 3))(img_input)
    # [112,112,64]
    x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1')(x)
    x = BatchNormalization(name='bn_conv1')(x)
    x = Activation('relu')(x)

    x = ZeroPadding2D((1, 1))(x)
    # [56,56,64]
    x = MaxPooling2D((3, 3), strides=(2, 2))(x)

    # [56,56,256]
    x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')

    # [28,28,512]
    x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')

    # [14,14,1024]
    x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')

    # [7,7,2048]
    x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a')
    x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b')
    x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c')

    # 代替全连接层
    x = AveragePooling2D((7, 7), name='avg_pool')(x)

    # 进行预测
    x = Flatten()(x)
    x = Dense(classes, activation='softmax', name='fc1000')(x)

    model = Model(img_input, x, name='resnet50')

    return model


if __name__ == '__main__':
    model = ResNet50()
    model.summary()

Training of classification network

1. Introduction to LOSS

Generally speaking, the loss function used by the classification network is the cross-entropy loss function, which is called Cross Entropy in English. The implementation formula is as follows.
Insert image description here
where:

[ $M$ ] ——The number of categories;
[ $y_{ic}$ ] ——True label (0 or 1), when the i-th sample belongs to class c, the value is 1, otherwise it is 0;
[ $p_{ic}$ ] ——Prediction result, the predicted probability that the i-th sample belongs to class c;
[ $i$ ] - indicates the sample number.

2. Use classification network for training

First go to Github to download the corresponding warehouse. After downloading, use decompression software to decompress it, and then use programming software to open the folder.
Note that the opened root directory must be correct, otherwise the code will not run if the relative directory is incorrect.
Be sure to note that the root directory after opening is the directory where the file is stored.
Insert image description here

a. Preparation of data set

The datasets folder stores training images, which are divided into two parts. Train contains training images, and test contains test images.
Insert image description here
Before training, you need to prepare the data set first. The data set format is divided into different folders under the train and test folders. The name of each folder is the corresponding category name. The folder The pictures below are pictures of this category.

Insert image description here

b. Data set processing

After preparing the data set, you need to run txt_annotation.py in the root directory to generate the cls_train.txt required for training.

Before running, you need to modify the classes in it and modify them into the classes you need to classify.
Insert image description here

c. Start network training

Through txt_annotation.py we have generated cls_train.txt and cls_test.txt, and now we can start training.

has many training parameters. You can read the comments carefully after downloading the library. The most important part is to modify the cls_classes.txt under the model_data folder so that it also corresponds to the classes you need to classify.
Insert image description here
After adjusting the network and weights you want to choose in train.py, you can start training!