Detailed Explanation of ZF Net Network Model of Convolutional Neural Network CNN (Theory)

1. Background
2. ZF Net model structure
3. Advantages and disadvantages of improvement

1. Background

  ZF Net is named after the authors , Matthew D. Zeiler and Rob Fergus (New York University), a paper written in 2013;

The original website of the paper https://arxiv.org/abs/1311.2901

论文名:Visualizing and Understanding Convolutional Networks

Paper Abstract : Large Convolutional Neural Networks show excellent performance on ImageNet. This paper attempts to address two questions, why this model performs so well, and how to improve the model. We introduce a novel visualization technique that provides insight into the functionality of intermediate feature layers and operational details of classifiers. The visualization technique eventually found a model structure that performed better than AlexNet, and also discovered the performance contributions made by different layers of the model.

  The model in this article is the champion of the 2013 ImageNet classification task . Its network structure has not been improved, but its performance has been improved a lot compared with Alex after adjusting the parameters .   ZF-Net just changes the first layer of AlexNet convolution kernel from 11 to 7, the step size from 4 to 2, and the 3rd, 4th, and 5th convolutional layers are changed to 384, 384, 256. This year's ImageNet was relatively quiet, and its champion ZF-Net's reputation was not as loud as the classic network architecture of other sessions.

2. ZF Net model structure

Supplement: Multi-channel convolution kernel convolution calculation
The following figure is 5* 5* 3 by a kernel 3* 3* 3=3* 3 *1
insert image description here

Network structure combing
insert image description here
description: A new visualization technique is proposed in the paper , which can understand the function of the intermediate feature map and the operation of the classifier .
  In the first layer of AlexNet, there is a large mixture of high-frequency (edge) and low-frequency (non-edge) information , but almost no coverage to the middle frequency information.
  Since the step size used in the first layer of convolution is 4, which is too large, there are a lot of aliasing situations, and the learned features are not particularly good-looking, unlike the later features that can see some textures, colors, etc.
  Therefore, the author changed the convolution kernel size of the first layer of AlexNet from 11*11 to 7*7 for the first question. At the same time, for the second problem, the convolution kernel sliding step of the first convolutional layer is changed from 4 to 2. At the same time, ZFNet changes the 3rd, 4th, and 5th convolutional layers of AlexNet to 384, 384, and 256.
  The difference is that AlexNet uses two GPUs for training to divide layers 3, 4, and 5 into two, and our structure is tighter.

Introduction to Network Layer enter number of cores - convolution window - padding - step size output - activation function Number of cores - Pooling window - Step size output - normalized overfitting method
one 224 *224 *3 Convolution using 96 cores 7* 7 *3, padding=0, stride=2 110 *110 *96 Maximum pooling 3 *3-stride=2 55*55*96-LRN local response normalization, scale 5x5 none
two 55 *55 *96 Convolution using 256 cores 5* 5 *96, padding=0, stride=2 26 *26 *256 Maximum pooling 3 *3-stride=2 13*13*256-LRN local response normalization, scale 5x5 none
three 13 *13 *256 Convolution using 384 cores 3* 3*256, padding=1, stride=1 13 *13 *384 none none - normalization none
Four 13 *13 *384 Convolution using 384 cores 3* 3*384, padding=1, stride=1 13 *13 *384 none none - normalization none
five 13 *13 *384 Convolution using 256 cores 3* 3*384, padding=1, stride=1 13 *13 *256 Maximum pooling 3 *3-stride=2 6*6*256-normalized none
Fully connected layer one 6*6 *256 Use 4096 6×6×256 convolution kernels for convolution 1x1x4096 (—4096 neuron operation results through the ReLU activation function none none - normalization Dropout: Randomly disconnect some neurons in the fully connected layer, and prevent overfitting by not activating certain neurons. After the drop operation, output 4096 output result values ​​of this layer
Fully connected layer 2 4096×1 (4096 neurons) none None ----- 4096 neuron calculation results through the ReLU activation function no pooling none - normalization 4096 data are fully connected with 4096 neurons in the seventh layer, and then processed by relu7 to generate 4096 data, and then processed by dropout7 to output 4096 data
output layer 4096×1 (4096 neurons) none None ----- 4096 neuron calculation results through the ReLU activation function no pooling none - normalization none

3. The content has been mentioned, see the ZF net project practice summary for details

Guess you like

Origin blog.csdn.net/qq_55433305/article/details/129342538