One article summarizes a comprehensive overview of rotating target detection: paper methods and codes

First put on the official website of the DOTA dataset (http://captain.whu.edu.cn/DOTAweb/index.html). The official website provides an interface for submitting horizontal and rotating targets. You can see the real-time ranking of the detection results (http://captain.whu.edu.cn/DOTAweb/index.html). .whu.edu.cn/DOTAweb/results.html), the current top five are from Xia Guisong team of Wuhan University, pca_lab of Nanjing University of Science and Technology, Cyber ​​Company, Institute of Electronics of Chinese Academy of Sciences and Ali idst. Click the plus sign in front to see the introduction of some teams.

Real-time ranking of the DOTA spinning target track (2019 12-22)

The following methods are introduced in the order of submission time.

 

1. RRPN (two-stage text detection Huake Baixiang Group)

Time: 3 Mar 2017

题目:Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Link: https://arxiv.org/abs/1703.01086

Innovation:

It should be the first to introduce a rotating candidate frame based on the RPN architecture to achieve scene text detection in any direction. Based on the rotated anchor to get the rotated ROI, and then extract the corresponding features, the effect can be

                                                               

                                                                                                                                       pipeline

                                                                                 

                                                                                                                              Pre-defined anchor

2. EAST (Single-stage text detector Questyle Technology)

Time: 11 Apr 2017

题目:EAST: An Efficient and Accurate Scene Text Detector

Link: https://arxiv.org/pdf/1704.03155.pdf

Knowing interpretation: https://zhuanlan.zhihu.com/p/37504120

Innovation:

  • A single-stage detection framework, figure3, is proposed. Propose a new way to define a rotating target (distance from the feature point to the four sides of the rotating frame and angle information), as shown in Figure c, Figure d, and e to predict the four distance and angle information respectively

  • It should be regarded as an earlier attempt to detect the rotating target by the anchor-free method. The rotated ground-truth box is scaled inward to reduce a range of the green box in the upper left corner (a) of the figure below, and the feature points fall in this green box as Positive sample. An anchor-free horizontal box target detector FoveaBox in 2019 is somewhat similar to this idea (arxiv.org/abs/1904.0379)

 

                                                                                    

                                                                                    

  • Propose a Locality-Aware NMS to accelerate the nms process

 

3. R2CNN (two-stage text detection Samsung China)

Time: 29 Jun 2017

题目:R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Link: https://arxiv.org/ftp/arxiv/papers/1706/1706.09579.pdf

Knowing interpretation: https://zhuanlan.zhihu.com/p/41662351

Innovation:

  • Propose a new way to define a rotating target (detect the first two corner points x1 y1 x2 y2 and the height of the rectangle among the four corner points in clockwise order)

                                                          

  • The faster rcnn framework is used as a whole. Considering the wide-height gap of some text boxes, in addition to the 7x7 pooled size, two pooled sizes, 3x11 and 11x3, are added during ROI pooling. 3x11 can better capture horizontal features, which is better for detecting frames that are wider than tall, while 11x3 can better capture vertical features, which is better for detecting frames that are taller than wide.

                                                     

4. RR-CNN (two-stage ship inspection, Institute of Automation, Chinese Academy of Sciences)

Time: Sept. 2017

题目:ROTATED REGION BASED CNN FOR SHIP DETECTION

Link: https://ieeexplore.ieee.org/document/8296411

Innovation:

  • Propose the RRoI pooling layer to extract the features of the rotating target

  • Regression rotating target model

  • Traditional NMS is done for similar goals, this article proposes multi-task NMS for multiple categories

                                                                   

                                                                                                                                         roi pooling

                                                                    

                                                                                                                                   Multitasking nms

5. DRBOX (two-stage target detection, Institute of Electronics, Chinese Academy of Sciences)

Time: 26 Nov 2017

题目:Learning a Rotation Invariant Detector with Rotatable Bounding Box

Link: https://arxiv.org/pdf/1711.09405.pdf

Innovation:

  • The network pipeline is as follows, the paper time is relatively early, did not specifically say what network structure is used, refer to other papers, DRBOX is similar to RPN structure

  • Earlier, it explained the problems of using horizontal frames to detect rotating targets.

6. TextBoxes++ (Single-stage Huake Baixiang Group)

Time: 9 Jan 2018

题目:TextBoxes++: A Single-Shot Oriented Scene Text Detector

Link: https://arxiv.org/pdf/1801.02765.pdf

Zhihu interpretation: https://zhuanlan.zhihu.com/p/33723456

Innovation:

  • Detect horizontal frame and rotating frame based on SSD

  • Use irregular convolution kernel:

    3x5 convolution kernel is used in textboxes++ to better adapt to text with a larger aspect ratio

  • Use OHEM strategy

    The training process adopts the OHEM strategy, which is different from the traditional OHEM. The training is divided into two stages. The positive-negative sample ratio of stage1 is 1:3, and the government sample ratio of stage2 is 1:6.

  • Multi-scale training

    Because Textboxes++ uses a fully convolutional structure, it can adapt to different scales of input. In order to adapt to different scale targets, multi-scale training is used.

  • Cascade NMS

    Since it is time-consuming to calculate the IOU of the slanted text, the author uses cascaded NMS to accelerate the IOU calculation. First calculate the IOU of the smallest bounding rectangle of all boxes, do an NMS with a threshold of 0.5, eliminate a part of the box, and then calculate the sloping box. On the basis of IOU, make an NMS with a threshold of 0.2.

 

7. Learning roi transformer for oriented object detection in aerial images (cvpr2019 Wuhan University Xia Guisong two stages)

Time 1 Dec 2018

题目:Learning roi transformer for oriented object detection in aerial images

Link to the paper: https://arxiv.org/abs/1812.00155

Innovation:

  • Based on the horizontal anchor, in the RPN stage, the fully connected learning is used to obtain the rotating ROI (different from the RRPN setting many rotating anchors, because this article is learning from the horizontal anchor to obtain the rotating ROI, reducing the amount of calculation), extracting features based on the rotating ROI, and then Locate and categorize

  • Rotated Position Sensitive RoI Align

    Extract roi features based on rotating frame

8. R2PN (two-stage)

Time: August 2018

题目:Toward arbitrary-oriented ship detection with rotated region proposal and discrimination networks

Link: https://www.researchgate.net/publication/327096241_Toward_Arbitrary-Oriented_Ship_Detection_With_Rotated_Region_Proposal_and_Discrimination_Networks

Innovation:

  • It feels more like RRPN. Based on the rotating anchor, the rotating ROI is obtained through the RPN, and the features are extracted based on the rotating ROI, and then positioning and classification are performed. The difference between this article and Learning roi transformer is that the former is a rotating anchor, and the latter is a horizontal anchor, which requires less calculation.

9. R2CNN++ (SCRDet) (two-stage, Institute of Electronics, Chinese Academy of Sciences)

Time: 17 Nov 2018

题目:SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects

Link: https://arxiv.org/abs/1811.07126

Add feature fusion and spatial and channel attention mechanisms. Based on the horizontal anchor, the rough ROI is predicted by RPN, and then the detection head realizes the coordinate prediction (x, y, w, h, θ) of any angle of the target. The pipeline is as follows:

pipline

Innovation:

  • SF-Net: Customized fusion of two feature maps of different layers to effectively detect small targets

SF-Net

  • MDA-Net: Use channel attention and pixel-level attention mechanisms to detect dense targets and small targets

MDA-Net

  • Proposed an improved version of smooth L1loss to solve the problem of discontinuous change in the angle of the rotating target when it is vertical (from 0° to -90°)

10. CAD-Net (two phases)

Time: 3 Mar 2019

题目:CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery

Link: https://arxiv.org/pdf/1903.00857.pdf

Innovation:

  • Propose GCNet (Global Context Network) to integrate global context information into target detection

  • Propose PLCNet (pyramid local context network) introduces spatial attention learning target collaboration relationship,

                                               

                                                                                                           Network pipeline

                                                              

                                                                                                           PLCNet structure

                                                                 

                                                                                                             Spatial attention

11. R3Det (Single-stage rotating target detection handed in & Nanli&Kuangshi)

Time Aug 2019

题目:R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object

Link to the paper: https://arxiv.org/abs/1908.05612

code:https://github.com/SJTU-Thinklab-Det/R3Det_Tensorflow

Interpretation link: https://ming71.github.io/R3Det

Innovation:

  • Rotating target detection (horizontal target detection also) may have a mismatch between the receptive field of the feature point where the anchor is located and the target position and shape (the upper left corner of the figure below, the green box is the anchor, and the feature point where it is located can only see this ship Part of the ship, then directly use the feature of this point to return to the anchor to fit the ground truth (red box) is not necessarily accurate), so this paper is divided into two stages: the first stage predicts the rotating box from the anchor (orange box), as follows The red number in the figure is 1->2. At this time, the orange box range is very close to the real target, and then the feature is extracted according to the orange box (I understand it as similar to ROI pooling feature extraction), and the ground truth is returned to the ground truth through this feature, as shown in the red below. Number 2->3.

                                                               

  • The network structure follows the RetinaNet structure, and introduces the feature refinement module, which can be superimposed multiple times

                                                      

                                                                                                            The network backbone uses the retinanet structure

                                                                           

                                                                                                         feature refinement module

 

Undertake programming in Matlab, Python and C++, machine learning, computer vision theory implementation and guidance, both undergraduate and master's degrees, salted fish trading, professional answers please go to know, please contact QQ number 757160542 for details, if you are the one.

 

 

Guess you like

Origin blog.csdn.net/weixin_36670529/article/details/114553278