Summary of target detection algorithm (Detection)

Here is what I in July 2018 when looking for a job, review the algorithm based on individual learned summed up the target detection (Detection).
Among them, only pick I think the more important series of algorithms, a brief overview in chronological order.

If elaborated wrong place, kindly pointed out.

R-CNN

time

2013

significance

  1. Founder level CNN detector;
  2. So that "converts the detected task classification tasks";
  3. Mainstream detection algorithm into the model from the traditional model of CNN;
  4. CVPR2014.

Innovation

To solve two major problems:

  1. How to position;
  2. How to achieve the detection task in the absence of detection of specific data sets.

solution

  1. The use of the region proposal to locate regression, and designed encode / decode mechanism;
  2. Pre-training on ImageNet, be moved to fine-tune the VOC. 2. Pre-training on ImageNet, be moved to fine-tune the VOC.

Shortcoming

  1. RP for each feature extraction were gone again, too inefficient.

Infrastructure network

AlexNet。

Think

  1. Although R-CNN is a landmark, but on it we could see a lot of wisdom predecessors (such as OverFeat) of.

SPPNet

time

2014

significance

  1. Sharing feature extraction, SPP layer;
  2. ECCV2014。

Innovation

  1. Sharing feature extraction: characterized in that the extraction time is no longer the bottleneck, a few months after the Fast R-CNN is the essence of this part of the absorbent core is improved and further points;
  2. SPP layer: on the proposal for pooling, so that the detection of the network may enter any size image. Because wedged between the pooling of the proposal from the input image to fc, so that there can not be a fc-coded size of the input picture.

Shortcoming

  1. Obtained after pooling and not a root tensor positional relationship as previously arrayed end to end but directly discarded important position information, is not conducive to the classification proposal (RoIPooling Fast R-CNN proposed improved this point).

Infrastructure network

  1. AlexNet。

Think

  1. SPPNet is a neglected outstanding contributions.

Fast R-CNN

time

2015

significance

  1. 4-stage -> 2-stage;
  2. RoIPooling;
  3. ICCV2015。

Innovation

  1. By three tasks (feature extraction, classification, regression) into a single CNN, the detection algorithm from a 4-stage into the 2-stage age;
  2. RoIPooling: SPP layer on simplification of the design, the better to retain the position information;
  3. Proposed mini-batch cognitive = 2, the consensus;
  4. Return loss: L2 -> Smooth L1.

structure

After half: 2 shared fc, followed by two parallel single fc, responsible for classification and regression.

Infrastructure network

VGG-16。

Faster R-CNN

time

2015

significance

  1. SS -> RPN + anchor mechanism, the first end-to-end detector;
  2. NIPS2015;
  3. COCO2015 1st。

Innovation

  1. RPN + anchor mechanism: the SS replaced with RPN, so that the "Generate RP" This task can enjoy good GPU. RP number of each image generated unchanged, but the efficiency jumped from 0.5fps to 100fps. RPN is essentially a "two-classified based on sliding window detector", additional overhead is only a two-tier network.
  2. The new target mechanisms.

Shortcoming

  1. Only used the top of the feature map (FPN be corrected in).

Infrastructure network

ZFNet or VGGNet.

detail

  1. A total of four loss, RPN accounted for two, detector sub-networks accounted for two.

Think

  1. Faster R-CNN in the Fast R-CNN "network" trend to achieve the extreme, to achieve the "All in one network". After all the two-stage method all its variants.

YOLOv1

time

2015

significance

  1. The first 1-stage detector;
  2. The first real-time detector;
  3. CVPR2016.

Innovation

  1. It does not require an extra stage to generate RP, but direct classification and regression output;
  2. Each grid is only responsible for a prediction object, so naturally the number of candidate blocks to significantly reduce the S × S × B, effectively alleviate the imbalance in the category.

advantage

  1. Super fast: YOLOv1 45fps; Fast YOLOv1 155fps.

Infrastructure network

GoogleNet。

Think

  1. Each grid is only responsible for one object, this design is very simple and crude, with an obvious a priori information. Because most of the pictures are dataset conventional picture distribution and orderly, reasonable size. However, for a small part of unconventional image (e.g., center position of two or more objects fall in the same grid) can only be a missed.
  2. Since only YOLOv1 analysis of the final characteristics of FIG. 7 * 7, so that its small target detection results poor.
  3. YOLOv1 selected fast track way choice dilemma testing speed and accuracy in breaking the 2-stage restriction pattern, to create a 1-stage. Although the accuracy is not high, but makes real-time detector become a reality, but also pointed out the general direction of real-time detector.
  4. YOLO system is generally not high precision, the actual floor, the base is not used YOLO system. Even for demanding real-time automatic pilot, also used FPN (+ Faster R-CNN) that set.

SSD

time

2015

significance

  1. multi-scale feature map;
  2. ECCV2016。

Innovation

  1. The first tap and utilize the feature map of a different scale. Since then, the detection has become more common in the feature map scale of.
  2. Faster R-CNN Anchor is first proposed, SSD is the first in-depth study (including the number of anchor, size, aspect ratio), subsequent YOLOv2 is further on this basis.

Infrastructure network

VGGNet.

Think

  1. Claiming a first high-precision real-time detector, in fact, not how high accuracy;
  2. Later on the basis of the SSD added FPN, evolved into DSSD.

R-FCN

time

2016

significance

  1. The first sub-network sharing a head;
  2. NIPS2016。

Innovation

  1. For "not shared proposal process (i.e., the tail Subnet)", proposed position sensitive score map, discard RoIPooling, the first half of the detection network and a fully connected half sections together, form a full convolution network.

advantage

  1. Faster R-CNN accuracy and almost the same, inference speed 2 to 20 times faster.

Shortcoming

  1. The head is too thick, there are k × k × (C + 1) of a thickness as a Light-head R-CNN foreshadowed.

YOLOv2

time

2016

significance

  1. The first large-scale detector;
  2. CVPR2017 Best Paper Mention。

Innovation

  1. A lot of trick, and design their own basemodel - DarkNet-19.
  2. Large-scale: Softmax nested softmax, thereby achieving hierarchical syntax tree. YOLOv2 with hierarchical syntax tree is called YOLO-9000.

Shortcoming

  1. Although expensive for the first large-scale detector, but because the accuracy is not high, it is not practical. Subsequent R-FCN-3000 inherits its pioneering ideas and continue to achieve high accuracy on large-scale.

FPN(+Faster R-CNN)

time

2016

significance

  1. Resolve small objects missed;
  2. CVPR2017.

Innovation

  1. Design module comprising a "top-down path" and "transverse connection", to fuse "missing feature details, but the top layer of the plurality of semantic information map" and "multiple semantic information but lacks details underlying feature map".

advantage

FPN small overhead, and it can detect small objects, components become standard detection algorithms.

Shortcoming

The top of the feature map did not have much positive to enjoy the FPN, it is still more than semantic information but lacks the location information is still unfavorable to detect large objects. (The problem appears foreshadowed PAN)

DSSD

time

2017

significance

SSD+FPN。

Think

Hydrological one.

Mask R-CNN

time

2017

significance

  1. RoIPooling -> RoIAlign;
  2. Add a third pipe-line for instance segmentation;
  3. ICCV2017 Best Paper。

DCN

time

2017

significance

  1. Deformable module (module becomes feasible).

Innovation

  1. In the conventional convolution Founder, with a layer of rear output RoIPooling 2-D offset, so that the output can be "automatic deformation." Layer by layer stacking "deformation", CNN will be able to more accurately read semantics goals.

advantage

  1. Simple design, increased less parameters, to support end-to-end training on a variety of complex visual task can be general.

Think

  1. Generally used only for the final layers, because the latter lost more detailed information, it needs to better characterize the deformation operation target.
  2. When we rush to press fm, the residual value of head, proposal is, DCN another way, the most basic way to calculate the convolution of surgery, insight great.

RetinaNet

time

2017

significance

  1. CE -> FL;
  2. ICCV2017 Best Student Paper。

Innovation

  1. We found the root cause of lost precision 2-stage system of the 1-stage system that accounts for the bulk of the anchor in absolute bg led "category unbalanced." The 1-stage RPN has covered, will not be affected, 1-stage no one covered. So the design focal loss to covered 1-stage.

Shortcoming

  1. fp and more.

Think

  1. Find root of the problem is much more than solving problems;
  2. RetinaNet very strong, it has become one of the best of the current detector, is also widely use landing.

Megd

time

2017

significance

  1. large-mini batch;
  2. COCO2017 1st;
  3. CVPR2018。

Innovation

  1. Innovation project, truly large-mini batch.

Shortcoming

  1. Difficult to reproduce unless you have 128 GPU.

Light-head R-CNN

time

2017

significance

  1. CVPR vote rejected.

Innovation

  1. The R-FCN header compression thick very thin, thus greatly accelerated.

Think

  1. So violently compressed model, performance actually rise, not fall, it is counter-intuitive. The author has not given a good explanation, so cast CVPR rejected.

SNIP

time

2017

significance

  1. Image Pyramid;
  2. CVPR2018。

Innovation

  1. Rediscovering the value of Image Pyramid and join the effective range of each scale is generated on this basis. Thanks to this, three divisions can be pipe-line event, weaknesses. v innovation] rediscovering the value of Image Pyramid and join the effective range of each scale is generated on this basis. Thanks to this, three divisions can be pipe-line event, weaknesses.

Shortcoming

  1. Model too, 1080ti 11G of memory simply can not lift, simply not practical.

Think

  1. By assigning tasks and greatly reduce the difficulty of detecting each pipe-line, thereby "cheating formula" to achieve a "scale invariance."

Cascade R-CNN

time

2017

significance

  1. A first detector cascade approach, 2-stage -> 4-stage;
  2. CVPR2018。

Innovation

  1. Author inspired by the face detection, the use of "guide cascade" approach, by cascading a round box to improve the quality of positioning.

Think

  1. Faster R-CNN detection algorithm will evolve from a 4-stage to a 2-stage, while Cascade R-CNN in turn 2-stage development back to the 4-stage, it can be regarded as a spiral;
  2. The first to introduce the idea of ​​a cascade target detection.

FCN-R-3000

time

2017

significance

  1. The first effective large-scale detector.

Innovation

  1. Regression by subcategory -> regression by major categories.

PAN

time

2018

significance

  1. FPN -> PAN;
  2. COCO2017 2nd。

YOLOv3

time

2018

Innovation

  1. Improved YOLO system is a major pain patients: small objects undetected.

DetNet

time

2018

significance

  1. The first customized specifically for the Detection of backbone;
  2. ECCV2018。

Innovation

  1. The original sampling at 32 times, 16 times with two convolution to replace, known as the receptive field can only achieve expansion by superimposing a convolution, and also avoid the loss of location information because of downsampling caused.

Think

  1. 16 years YOLOv1 would have done so, but did not come up in letters.
Published 599 original articles · won praise 856 · Views 1.84 million +

Guess you like

Origin blog.csdn.net/JNingWei/article/details/86607160