Here is what I in July 2018 when looking for a job, review the algorithm based on individual learned summed up the target detection (Detection).
Among them, only pick I think the more important series of algorithms, a brief overview in chronological order.
If elaborated wrong place, kindly pointed out.
R-CNN
time
2013
significance
- Founder level CNN detector;
- So that "converts the detected task classification tasks";
- Mainstream detection algorithm into the model from the traditional model of CNN;
- CVPR2014.
Innovation
To solve two major problems:
- How to position;
- How to achieve the detection task in the absence of detection of specific data sets.
solution
- The use of the region proposal to locate regression, and designed encode / decode mechanism;
- Pre-training on ImageNet, be moved to fine-tune the VOC. 2. Pre-training on ImageNet, be moved to fine-tune the VOC.
Shortcoming
- RP for each feature extraction were gone again, too inefficient.
Infrastructure network
AlexNet。
Think
- Although R-CNN is a landmark, but on it we could see a lot of wisdom predecessors (such as OverFeat) of.
SPPNet
time
2014
significance
- Sharing feature extraction, SPP layer;
- ECCV2014。
Innovation
- Sharing feature extraction: characterized in that the extraction time is no longer the bottleneck, a few months after the Fast R-CNN is the essence of this part of the absorbent core is improved and further points;
- SPP layer: on the proposal for pooling, so that the detection of the network may enter any size image. Because wedged between the pooling of the proposal from the input image to fc, so that there can not be a fc-coded size of the input picture.
Shortcoming
- Obtained after pooling and not a root tensor positional relationship as previously arrayed end to end but directly discarded important position information, is not conducive to the classification proposal (RoIPooling Fast R-CNN proposed improved this point).
Infrastructure network
- AlexNet。
Think
- SPPNet is a neglected outstanding contributions.
Fast R-CNN
time
2015
significance
- 4-stage -> 2-stage;
- RoIPooling;
- ICCV2015。
Innovation
- By three tasks (feature extraction, classification, regression) into a single CNN, the detection algorithm from a 4-stage into the 2-stage age;
- RoIPooling: SPP layer on simplification of the design, the better to retain the position information;
- Proposed mini-batch cognitive = 2, the consensus;
- Return loss: L2 -> Smooth L1.
structure
After half: 2 shared fc, followed by two parallel single fc, responsible for classification and regression.
Infrastructure network
VGG-16。
Faster R-CNN
time
2015
significance
- SS -> RPN + anchor mechanism, the first end-to-end detector;
- NIPS2015;
- COCO2015 1st。
Innovation
- RPN + anchor mechanism: the SS replaced with RPN, so that the "Generate RP" This task can enjoy good GPU. RP number of each image generated unchanged, but the efficiency jumped from 0.5fps to 100fps. RPN is essentially a "two-classified based on sliding window detector", additional overhead is only a two-tier network.
- The new target mechanisms.
Shortcoming
- Only used the top of the feature map (FPN be corrected in).
Infrastructure network
ZFNet or VGGNet.
detail
- A total of four loss, RPN accounted for two, detector sub-networks accounted for two.
Think
- Faster R-CNN in the Fast R-CNN "network" trend to achieve the extreme, to achieve the "All in one network". After all the two-stage method all its variants.
YOLOv1
time
2015
significance
- The first 1-stage detector;
- The first real-time detector;
- CVPR2016.
Innovation
- It does not require an extra stage to generate RP, but direct classification and regression output;
- Each grid is only responsible for a prediction object, so naturally the number of candidate blocks to significantly reduce the S × S × B, effectively alleviate the imbalance in the category.
advantage
- Super fast: YOLOv1 45fps; Fast YOLOv1 155fps.
Infrastructure network
GoogleNet。
Think
- Each grid is only responsible for one object, this design is very simple and crude, with an obvious a priori information. Because most of the pictures are dataset conventional picture distribution and orderly, reasonable size. However, for a small part of unconventional image (e.g., center position of two or more objects fall in the same grid) can only be a missed.
- Since only YOLOv1 analysis of the final characteristics of FIG. 7 * 7, so that its small target detection results poor.
- YOLOv1 selected fast track way choice dilemma testing speed and accuracy in breaking the 2-stage restriction pattern, to create a 1-stage. Although the accuracy is not high, but makes real-time detector become a reality, but also pointed out the general direction of real-time detector.
- YOLO system is generally not high precision, the actual floor, the base is not used YOLO system. Even for demanding real-time automatic pilot, also used FPN (+ Faster R-CNN) that set.
SSD
time
2015
significance
- multi-scale feature map;
- ECCV2016。
Innovation
- The first tap and utilize the feature map of a different scale. Since then, the detection has become more common in the feature map scale of.
- Faster R-CNN Anchor is first proposed, SSD is the first in-depth study (including the number of anchor, size, aspect ratio), subsequent YOLOv2 is further on this basis.
Infrastructure network
VGGNet.
Think
- Claiming a first high-precision real-time detector, in fact, not how high accuracy;
- Later on the basis of the SSD added FPN, evolved into DSSD.
R-FCN
time
2016
significance
- The first sub-network sharing a head;
- NIPS2016。
Innovation
- For "not shared proposal process (i.e., the tail Subnet)", proposed position sensitive score map, discard RoIPooling, the first half of the detection network and a fully connected half sections together, form a full convolution network.
advantage
- Faster R-CNN accuracy and almost the same, inference speed 2 to 20 times faster.
Shortcoming
- The head is too thick, there are k × k × (C + 1) of a thickness as a Light-head R-CNN foreshadowed.
YOLOv2
time
2016
significance
- The first large-scale detector;
- CVPR2017 Best Paper Mention。
Innovation
- A lot of trick, and design their own basemodel - DarkNet-19.
- Large-scale: Softmax nested softmax, thereby achieving hierarchical syntax tree. YOLOv2 with hierarchical syntax tree is called YOLO-9000.
Shortcoming
- Although expensive for the first large-scale detector, but because the accuracy is not high, it is not practical. Subsequent R-FCN-3000 inherits its pioneering ideas and continue to achieve high accuracy on large-scale.
FPN(+Faster R-CNN)
time
2016
significance
- Resolve small objects missed;
- CVPR2017.
Innovation
- Design module comprising a "top-down path" and "transverse connection", to fuse "missing feature details, but the top layer of the plurality of semantic information map" and "multiple semantic information but lacks details underlying feature map".
advantage
FPN small overhead, and it can detect small objects, components become standard detection algorithms.
Shortcoming
The top of the feature map did not have much positive to enjoy the FPN, it is still more than semantic information but lacks the location information is still unfavorable to detect large objects. (The problem appears foreshadowed PAN)
DSSD
time
2017
significance
SSD+FPN。
Think
Hydrological one.
Mask R-CNN
time
2017
significance
- RoIPooling -> RoIAlign;
- Add a third pipe-line for instance segmentation;
- ICCV2017 Best Paper。
DCN
time
2017
significance
- Deformable module (module becomes feasible).
Innovation
- In the conventional convolution Founder, with a layer of rear output RoIPooling 2-D offset, so that the output can be "automatic deformation." Layer by layer stacking "deformation", CNN will be able to more accurately read semantics goals.
advantage
- Simple design, increased less parameters, to support end-to-end training on a variety of complex visual task can be general.
Think
- Generally used only for the final layers, because the latter lost more detailed information, it needs to better characterize the deformation operation target.
- When we rush to press fm, the residual value of head, proposal is, DCN another way, the most basic way to calculate the convolution of surgery, insight great.
RetinaNet
time
2017
significance
- CE -> FL;
- ICCV2017 Best Student Paper。
Innovation
- We found the root cause of lost precision 2-stage system of the 1-stage system that accounts for the bulk of the anchor in absolute bg led "category unbalanced." The 1-stage RPN has covered, will not be affected, 1-stage no one covered. So the design focal loss to covered 1-stage.
Shortcoming
- fp and more.
Think
- Find root of the problem is much more than solving problems;
- RetinaNet very strong, it has become one of the best of the current detector, is also widely use landing.
Megd
time
2017
significance
- large-mini batch;
- COCO2017 1st;
- CVPR2018。
Innovation
- Innovation project, truly large-mini batch.
Shortcoming
- Difficult to reproduce unless you have 128 GPU.
Light-head R-CNN
time
2017
significance
- CVPR vote rejected.
Innovation
- The R-FCN header compression thick very thin, thus greatly accelerated.
Think
- So violently compressed model, performance actually rise, not fall, it is counter-intuitive. The author has not given a good explanation, so cast CVPR rejected.
SNIP
time
2017
significance
- Image Pyramid;
- CVPR2018。
Innovation
- Rediscovering the value of Image Pyramid and join the effective range of each scale is generated on this basis. Thanks to this, three divisions can be pipe-line event, weaknesses. v innovation] rediscovering the value of Image Pyramid and join the effective range of each scale is generated on this basis. Thanks to this, three divisions can be pipe-line event, weaknesses.
Shortcoming
- Model too, 1080ti 11G of memory simply can not lift, simply not practical.
Think
- By assigning tasks and greatly reduce the difficulty of detecting each pipe-line, thereby "cheating formula" to achieve a "scale invariance."
Cascade R-CNN
time
2017
significance
- A first detector cascade approach, 2-stage -> 4-stage;
- CVPR2018。
Innovation
- Author inspired by the face detection, the use of "guide cascade" approach, by cascading a round box to improve the quality of positioning.
Think
- Faster R-CNN detection algorithm will evolve from a 4-stage to a 2-stage, while Cascade R-CNN in turn 2-stage development back to the 4-stage, it can be regarded as a spiral;
- The first to introduce the idea of a cascade target detection.
FCN-R-3000
time
2017
significance
- The first effective large-scale detector.
Innovation
- Regression by subcategory -> regression by major categories.
PAN
time
2018
significance
- FPN -> PAN;
- COCO2017 2nd。
YOLOv3
time
2018
Innovation
- Improved YOLO system is a major pain patients: small objects undetected.
DetNet
time
2018
significance
- The first customized specifically for the Detection of backbone;
- ECCV2018。
Innovation
- The original sampling at 32 times, 16 times with two convolution to replace, known as the receptive field can only achieve expansion by superimposing a convolution, and also avoid the loss of location information because of downsampling caused.
Think
- 16 years YOLOv1 would have done so, but did not come up in letters.