Here is what I in July 2018 when looking for a job, review (Basemodel) based on individual learned summed up the underlying network.
Among them, only pick I think the more important series of basic network, a brief overview in chronological order.
If elaborated wrong place, kindly pointed out.
LeNet
time
1998
significance
- CNN's launch marks the real.
advantage
- It achieved a 99% accuracy rate higher than on character recognition.
Shortcoming
- There was no GPU, training LeNet very time-consuming;
- Non-OCR tasks, even worse than SVM.
application
- OCR (such as the US postal system).
AlexNet
time
2012
significance
- It demonstrates the effectiveness of CNN in a complex model, CV areas of great development "blowout."
- ImageNet2012 1st.
Innovation
- Data levels:
- ImageNet using mass data (to prevent over-fitting);
- Use the data enhancement (to prevent over-fitting).
- The network level:
- Was added to prevent overfitting dropout;
- Relu activation function, avoid the upside of the gradient disappears;
- Partial response normalized (LRN), use the data to do near normalized (bn appeared after 2015 phase-out).
- Hardware level:
- Using the GPU, rapid convergence (double pipe-line represents the paper, i.e. parallel dual GPU).
advantage
Breaking the ceiling layers, depth of eight.
application
- R-CNN;
- SPPNet。
ZFNet
time
2013
significance
- Using deconvolution of CNN visual features, CNN's insight into the hierarchy.
application
- Faster R-CNN。
the
significance
- Multilayer perceptron (fully connected multilayer composition layer and the nonlinear function) replaces the previous CNN simple linear convolution layer. Because of the nonlinear depth study of all abstraction and representation capability of the source, and therefore the equivalent of "Gasser MLP between layers to increase non-linear."
application
- Of reference by Inceptionv1.
GoogleNet
time
2014
significance
- Proposed Inceptionv1, opened the Inception series;
- Refresh the depth of the network set a new record depth of the new network;
- The first proposed method of dimension reduction is conv1 × 1. The first proposed method of dimension reduction is conv1 × 1.
Innovation
- NiN similar structure, i.e. the original node is a sub-network.
advantage
- Parameter less.
Shortcoming
- complex structure.
variant
- Inceptionv2/3/4、Xception
application
- YOLOv1。
VGGNet
time
2014
significance
- ImageNet2014 positioning tasks first, second classification task.
Innovation
- The widespread use of small network convolution kernel to replace the large convolution kernel.
advantage
- Network architecture super-simple, highly modular.
- Convolution kernel can be achieved with the same large scale of the receptive field after a small convolution kernel stack, and a smaller amount of parameters;
- Deepened network layers -> enhances network capacity.
Shortcoming
- Model too (most of the parameters in the last three fc layers), inefficient.
- Superposition convolution kernel means that the number of layers increases, leading to an increase in total calculated amount of the entire network;
- Increase the number of layers leads to greater risk gradient.
structure
Convolution five scale, the first two layers each containing 2 convolution, the convolution of three layers, each containing 3-4; FC followed by three, the last ending Softmax.
application
- Fast R-CNN;
- Faster R-CNN;
- SSD。
serious
time
2015
significance
- The current mainstream basemodel;
- ImageNet2015 champion;
- CVPR2016 Best Paper。
Innovation
- Highway Networks first design of mapping between the different layers, and ResNet Highway Networks simplified design, mapping performed only between adjacent module. Once again, the gradient eased the problem, and once again broke the network layers of the ceiling. Meanwhile, the design also makes early training can be extremely fast convergence.
analysis
Why Effective:
- Model to learn, there has always been a part of the structure of identity mapping. The original "serial" network is difficult to learn this mapping. ResNet representing an increase of constraints to help model easily learned the identity mapping.
- After adding identity mapping, learning objectives into a residual, it is significantly easier to learn than the original object.
- Standing identity mapping perspective, then deep ResNet fact, not several layers.
Think
- ResNet more than 50 layers have used conv1 × 1, it is to think dimensionality reduction.
variant
- ResNeXt;
- DenseNet;
- DPN.
application
- R-FCN;
- FPN(+Faster R-CNN);
- Mask R-CNN;
- RetinaNet。
DenseNet
time
2016
significance
- CVPR2017 Best Paper。
Innovation
- The original ResNet of "serial type one identity mapping" turned into "identity mapping one to many", the design is relatively simple.
Think
- Because it is only for identity mapping ResNet be carried forward, this idea a lot of people can think of, all the text is not amazing.
application
- Play brush list;
- Paper brush AP.
Xception
time
2016
significance
- The most mainstream of several lightweight one basemodel.
Innovation
- Group replaces a serial parallel type previously used group. (The group responsible for the serial module is called separable convolution)
advantage
- Parameter was minimal, ultra-small model.
Shortcoming
- Feature extraction capability is poor.
variant
- Xception145;
- Xception39。
ResNeXt
time
2016
Innovation
The original ResNet simple "residual architecture" replaced by "Inception version of the residual structure."
advantage
On ImageNet can converge faster than the ResNet classification results slightly better point.
Shortcoming
complex.
application
- Play brush list;
- Paper brush AP.
DPN
time
2017
Innovation
- Double pipe-line. One side is ResNet, one side is DenseNet, known as the pipe-line allows the two complement each other.
application
- Play brush list;
- Paper brush AP.
to sum up
Before ResNet out, basemodel gone from a dominant AlexNet, to GoogleNet and VGGNet split the world period. After ResNet out due to its simplicity and powerful, become the absolute benchmark of basemodel.
Today, practitioners follow the general process is:
- First with ResNet-50 to verify the effectiveness of the algorithm;
- When this algorithm is effective in terms ResNet-50, if to pursue speed of the algorithm (e.g., ground to a mobile terminal), the replacement of basemodel Xception, ShuffleNet; if you want to pursue precision (e.g., the AP paper brush, brush play list), basemodel then replaced ResNet-101 / ResNeXt / DPN.
- train stage, usually directly into a pre-trained basemodel, fine-tune a, twenty epoch can be in their own data sets.