First put on the official website of the DOTA dataset (http://captain.whu.edu.cn/DOTAweb/index.html). The official website provides an interface for submitting horizontal and rotating targets. You can see the real-time ranking of the detection results (http://captain.whu.edu.cn/DOTAweb/index.html). .whu.edu.cn/DOTAweb/results.html), the current top five are from Xia Guisong team of Wuhan University, pca_lab of Nanjing University of Science and Technology, Cyber Company, Institute of Electronics of Chinese Academy of Sciences and Ali idst. Click the plus sign in front to see the introduction of some teams.
Real-time ranking of the DOTA spinning target track (2019 12-22)
The following methods are introduced in the order of submission time.
1. RRPN (two-stage text detection Huake Baixiang Group)
Time: 3 Mar 2017
题目:Arbitrary-Oriented Scene Text Detection via Rotation Proposals
Link: https://arxiv.org/abs/1703.01086
Innovation:
It should be the first to introduce a rotating candidate frame based on the RPN architecture to achieve scene text detection in any direction. Based on the rotated anchor to get the rotated ROI, and then extract the corresponding features, the effect can be
pipeline
Pre-defined anchor
2. EAST (Single-stage text detector Questyle Technology)
Time: 11 Apr 2017
题目:EAST: An Efficient and Accurate Scene Text Detector
Link: https://arxiv.org/pdf/1704.03155.pdf
Knowing interpretation: https://zhuanlan.zhihu.com/p/37504120
Innovation:
-
A single-stage detection framework, figure3, is proposed. Propose a new way to define a rotating target (distance from the feature point to the four sides of the rotating frame and angle information), as shown in Figure c, Figure d, and e to predict the four distance and angle information respectively
-
It should be regarded as an earlier attempt to detect the rotating target by the anchor-free method. The rotated ground-truth box is scaled inward to reduce a range of the green box in the upper left corner (a) of the figure below, and the feature points fall in this green box as Positive sample. An anchor-free horizontal box target detector FoveaBox in 2019 is somewhat similar to this idea (arxiv.org/abs/1904.0379)
-
Propose a Locality-Aware NMS to accelerate the nms process
3. R2CNN (two-stage text detection Samsung China)
Time: 29 Jun 2017
题目:R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
Link: https://arxiv.org/ftp/arxiv/papers/1706/1706.09579.pdf
Knowing interpretation: https://zhuanlan.zhihu.com/p/41662351
Innovation:
-
Propose a new way to define a rotating target (detect the first two corner points x1 y1 x2 y2 and the height of the rectangle among the four corner points in clockwise order)
-
The faster rcnn framework is used as a whole. Considering the wide-height gap of some text boxes, in addition to the 7x7 pooled size, two pooled sizes, 3x11 and 11x3, are added during ROI pooling. 3x11 can better capture horizontal features, which is better for detecting frames that are wider than tall, while 11x3 can better capture vertical features, which is better for detecting frames that are taller than wide.
4. RR-CNN (two-stage ship inspection, Institute of Automation, Chinese Academy of Sciences)
Time: Sept. 2017
题目:ROTATED REGION BASED CNN FOR SHIP DETECTION
Link: https://ieeexplore.ieee.org/document/8296411
Innovation:
-
Propose the RRoI pooling layer to extract the features of the rotating target
-
Regression rotating target model
-
Traditional NMS is done for similar goals, this article proposes multi-task NMS for multiple categories
roi pooling
Multitasking nms
5. DRBOX (two-stage target detection, Institute of Electronics, Chinese Academy of Sciences)
Time: 26 Nov 2017
题目:Learning a Rotation Invariant Detector with Rotatable Bounding Box
Link: https://arxiv.org/pdf/1711.09405.pdf
Innovation:
-
The network pipeline is as follows, the paper time is relatively early, did not specifically say what network structure is used, refer to other papers, DRBOX is similar to RPN structure
-
Earlier, it explained the problems of using horizontal frames to detect rotating targets.
6. TextBoxes++ (Single-stage Huake Baixiang Group)
Time: 9 Jan 2018
题目:TextBoxes++: A Single-Shot Oriented Scene Text Detector
Link: https://arxiv.org/pdf/1801.02765.pdf
Zhihu interpretation: https://zhuanlan.zhihu.com/p/33723456
Innovation:
-
Detect horizontal frame and rotating frame based on SSD
-
Use irregular convolution kernel:
3x5 convolution kernel is used in textboxes++ to better adapt to text with a larger aspect ratio
-
Use OHEM strategy
The training process adopts the OHEM strategy, which is different from the traditional OHEM. The training is divided into two stages. The positive-negative sample ratio of stage1 is 1:3, and the government sample ratio of stage2 is 1:6.
-
Multi-scale training
Because Textboxes++ uses a fully convolutional structure, it can adapt to different scales of input. In order to adapt to different scale targets, multi-scale training is used.
-
Cascade NMS
Since it is time-consuming to calculate the IOU of the slanted text, the author uses cascaded NMS to accelerate the IOU calculation. First calculate the IOU of the smallest bounding rectangle of all boxes, do an NMS with a threshold of 0.5, eliminate a part of the box, and then calculate the sloping box. On the basis of IOU, make an NMS with a threshold of 0.2.
7. Learning roi transformer for oriented object detection in aerial images (cvpr2019 Wuhan University Xia Guisong two stages)
Time 1 Dec 2018
题目:Learning roi transformer for oriented object detection in aerial images
Link to the paper: https://arxiv.org/abs/1812.00155
Innovation:
-
Based on the horizontal anchor, in the RPN stage, the fully connected learning is used to obtain the rotating ROI (different from the RRPN setting many rotating anchors, because this article is learning from the horizontal anchor to obtain the rotating ROI, reducing the amount of calculation), extracting features based on the rotating ROI, and then Locate and categorize
-
Rotated Position Sensitive RoI Align
Extract roi features based on rotating frame
8. R2PN (two-stage)
Time: August 2018
题目:Toward arbitrary-oriented ship detection with rotated region proposal and discrimination networks
Link: https://www.researchgate.net/publication/327096241_Toward_Arbitrary-Oriented_Ship_Detection_With_Rotated_Region_Proposal_and_Discrimination_Networks
Innovation:
-
It feels more like RRPN. Based on the rotating anchor, the rotating ROI is obtained through the RPN, and the features are extracted based on the rotating ROI, and then positioning and classification are performed. The difference between this article and Learning roi transformer is that the former is a rotating anchor, and the latter is a horizontal anchor, which requires less calculation.
9. R2CNN++ (SCRDet) (two-stage, Institute of Electronics, Chinese Academy of Sciences)
Time: 17 Nov 2018
题目:SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects
Link: https://arxiv.org/abs/1811.07126
Add feature fusion and spatial and channel attention mechanisms. Based on the horizontal anchor, the rough ROI is predicted by RPN, and then the detection head realizes the coordinate prediction (x, y, w, h, θ) of any angle of the target. The pipeline is as follows:
pipline
Innovation:
-
SF-Net: Customized fusion of two feature maps of different layers to effectively detect small targets
SF-Net
-
MDA-Net: Use channel attention and pixel-level attention mechanisms to detect dense targets and small targets
MDA-Net
-
Proposed an improved version of smooth L1loss to solve the problem of discontinuous change in the angle of the rotating target when it is vertical (from 0° to -90°)
10. CAD-Net (two phases)
Time: 3 Mar 2019
题目:CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery
Link: https://arxiv.org/pdf/1903.00857.pdf
Innovation:
-
Propose GCNet (Global Context Network) to integrate global context information into target detection
-
Propose PLCNet (pyramid local context network) introduces spatial attention learning target collaboration relationship,
Network pipeline
PLCNet structure
Spatial attention
11. R3Det (Single-stage rotating target detection handed in & Nanli&Kuangshi)
Time Aug 2019
题目:R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object
Link to the paper: https://arxiv.org/abs/1908.05612
code:https://github.com/SJTU-Thinklab-Det/R3Det_Tensorflow
Interpretation link: https://ming71.github.io/R3Det
Innovation:
-
Rotating target detection (horizontal target detection also) may have a mismatch between the receptive field of the feature point where the anchor is located and the target position and shape (the upper left corner of the figure below, the green box is the anchor, and the feature point where it is located can only see this ship Part of the ship, then directly use the feature of this point to return to the anchor to fit the ground truth (red box) is not necessarily accurate), so this paper is divided into two stages: the first stage predicts the rotating box from the anchor (orange box), as follows The red number in the figure is 1->2. At this time, the orange box range is very close to the real target, and then the feature is extracted according to the orange box (I understand it as similar to ROI pooling feature extraction), and the ground truth is returned to the ground truth through this feature, as shown in the red below. Number 2->3.
-
The network structure follows the RetinaNet structure, and introduces the feature refinement module, which can be superimposed multiple times
The network backbone uses the retinanet structure
feature refinement module
Undertake programming in Matlab, Python and C++, machine learning, computer vision theory implementation and guidance, both undergraduate and master's degrees, salted fish trading, professional answers please go to know, please contact QQ number 757160542 for details, if you are the one.