First, the characteristics
1, to improve AlexNet, a convolution in the first layer in a smaller convolution kernel and a stride of
2, multi-scale training (training and testing, the entire map of the different scales)
Thus, VGG simple structure, ability to extract features, widespread application scenarios
Comparative test results from a single scale:
Second, the comparison of different structures
VGG provided a total of six versions of the network, a network to explore the effect of different contrast.
The following brief analysis about the various versions of network configuration details:
Structure A: AlexNet and similar, divided into five layers convolution stage, 3 full connection layer, except that the layer is a convolution 3x3 convolution kernel size;
the structure A-LRN: AlexNet retained in LRN operation, other structures are the same as a;
structure B: a stage2 and stage3 a layer of a 3x3 convolution increased, a total of 10 layers of the convolution;
structure C: on the basis of B, stage3, stage4, stage5 increase respectively a layer of 1x1 convolution, convolution 13 layers, 16 layers in total;
structure D: on the basis of C, stage3, stage4, stage5 layer increased a 3x3 convolution, convolution layer 13, a total of 16 layer;
structure E: D on the basis of, stage3, stage4, stage5 layer respectively add a 3x3 convolution, convolution 16 layers, 19 layers in total.
Each structural effect of contrast:
A comparison with the LRN-A: A-LRN A no good result, LRN effect not described;
A is compared with B, C, D, E: A layer number is minimum among these, effective as compared to A B, C, D, E, the better the deeper layers described;
B and C comparison: 1x1 convolution kernel to increase, adding additional non-linear effect lift;
Comparative C and D: a 3x3 convolution kernel (structure D) ratio 1x1 good (structure C) effect. (Note !!!!)
comparison between the C, D, E, will improve the accuracy of multi-scale
Third, the discussion of the advantages of the convolution kernel
1. Why use a 3 × 3 convolution kernel?
(1) 3 3 × 3 convolution kernel of the receptive field of a convolution kernel of the 7 × 7 equivalent receptive field, but the added intermediate activation function, as compared with a 7 × 7 convolution kernel, the depth deeper and increases the nonlinear
(2) reduce the amount of parameters:
(C. 3 × × ×. 3 C) = 27C. 3 ^ 2 ×
C × × C. 7. 7 = 49C × 2 ^
2,1 × convolution kernel role. 1 ( While these two functions can be achieved, but the parameter is greater) convolution kernel with the other
(1) non-linearly increase
(2), and l-dimensional dimensionality reduction
Fourth, the training data preprocessing
The first step: The picture of same-sex zoom, minimum side length of 256
Step Two: randomly taken image block is 224 × 224
The third step: the cropped image block and random horizontal flip transform RGB color
Added: You can also use dense evalation not cropped images directly into the network, will be connected to the back of the whole layer instead convolution layer