1. Background
2. ZF Net model structure
3. Advantages and disadvantages of improvement
1. Background
ZF Net is named after the authors , Matthew D. Zeiler and Rob Fergus (New York University), a paper written in 2013;
The original website of the paper https://arxiv.org/abs/1311.2901
论文名:Visualizing and Understanding Convolutional Networks
Paper Abstract : Large Convolutional Neural Networks show excellent performance on ImageNet. This paper attempts to address two questions, why this model performs so well, and how to improve the model. We introduce a novel visualization technique that provides insight into the functionality of intermediate feature layers and operational details of classifiers. The visualization technique eventually found a model structure that performed better than AlexNet, and also discovered the performance contributions made by different layers of the model.
The model in this article is the champion of the 2013 ImageNet classification task . Its network structure has not been improved, but its performance has been improved a lot compared with Alex after adjusting the parameters . ZF-Net just changes the first layer of AlexNet convolution kernel from 11 to 7, the step size from 4 to 2, and the 3rd, 4th, and 5th convolutional layers are changed to 384, 384, 256. This year's ImageNet was relatively quiet, and its champion ZF-Net's reputation was not as loud as the classic network architecture of other sessions.
2. ZF Net model structure
Supplement: Multi-channel convolution kernel convolution calculation
The following figure is 5* 5* 3 by a kernel 3* 3* 3=3* 3 *1
Network structure combing
description: A new visualization technique is proposed in the paper , which can understand the function of the intermediate feature map and the operation of the classifier .
In the first layer of AlexNet, there is a large mixture of high-frequency (edge) and low-frequency (non-edge) information , but almost no coverage to the middle frequency information.
Since the step size used in the first layer of convolution is 4, which is too large, there are a lot of aliasing situations, and the learned features are not particularly good-looking, unlike the later features that can see some textures, colors, etc.
Therefore, the author changed the convolution kernel size of the first layer of AlexNet from 11*11 to 7*7 for the first question. At the same time, for the second problem, the convolution kernel sliding step of the first convolutional layer is changed from 4 to 2. At the same time, ZFNet changes the 3rd, 4th, and 5th convolutional layers of AlexNet to 384, 384, and 256.
The difference is that AlexNet uses two GPUs for training to divide layers 3, 4, and 5 into two, and our structure is tighter.
Introduction to Network Layer | enter | number of cores - convolution window - padding - step size | output - activation function | Number of cores - Pooling window - Step size | output - normalized | overfitting method |
---|---|---|---|---|---|---|
one | 224 *224 *3 | Convolution using 96 cores 7* 7 *3, padding=0, stride=2 | 110 *110 *96 | Maximum pooling 3 *3-stride=2 | 55*55*96-LRN local response normalization, scale 5x5 | none |
two | 55 *55 *96 | Convolution using 256 cores 5* 5 *96, padding=0, stride=2 | 26 *26 *256 | Maximum pooling 3 *3-stride=2 | 13*13*256-LRN local response normalization, scale 5x5 | none |
three | 13 *13 *256 | Convolution using 384 cores 3* 3*256, padding=1, stride=1 | 13 *13 *384 | none | none - normalization | none |
Four | 13 *13 *384 | Convolution using 384 cores 3* 3*384, padding=1, stride=1 | 13 *13 *384 | none | none - normalization | none |
five | 13 *13 *384 | Convolution using 256 cores 3* 3*384, padding=1, stride=1 | 13 *13 *256 | Maximum pooling 3 *3-stride=2 | 6*6*256-normalized | none |
Fully connected layer one | 6*6 *256 | Use 4096 6×6×256 convolution kernels for convolution | 1x1x4096 (—4096 neuron operation results through the ReLU activation function | none | none - normalization | Dropout: Randomly disconnect some neurons in the fully connected layer, and prevent overfitting by not activating certain neurons. After the drop operation, output 4096 output result values of this layer |
Fully connected layer 2 | 4096×1 (4096 neurons) | none | None ----- 4096 neuron calculation results through the ReLU activation function | no pooling | none - normalization | 4096 data are fully connected with 4096 neurons in the seventh layer, and then processed by relu7 to generate 4096 data, and then processed by dropout7 to output 4096 data |
output layer | 4096×1 (4096 neurons) | none | None ----- 4096 neuron calculation results through the ReLU activation function | no pooling | none - normalization | none |