Remote Sensing Target research

Before the job done when digital image processing, finishing up back up a bit.

In recent years, target detection depth learning experience in this wave of wave of rapid iteration and development, including two-stage R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN series of methods, and more on the FPN, FCN, Focal Loss and other techniques, since accuracy is better, is still in a variety of inspection tasks actor an important role, including in the field of remote sensing images. In contrast, SSD, a phase detection method Yolo series, due to higher efficiency, showing the landing in the application of great vitality.

Above the R-CNN, YOLO series of several classic target detection algorithm on a common PASCAL VOC, COCO data set has a good effect, but in the field of remote sensing they need to do a lot of targeted optimization, in order to do better remote sensing image detection. Compared with ordinary remote sensing images in natural scene image, it has the following characteristics:

  1. Angle shot from the sky, 30 degrees -90 degrees
  2. Because image covering a broad range, a large proportion of small targets, densely distributed
  3. Rotation invariance (direction of the vessel are each, but the tree is always perpendicular ImageNet)
  4. More importantly, the environment information
  5. Less training data, but the single image pixels may be great (some little dataset)
  6. Pictures may contain a variety of other information, such as band, geographic coordinates

The above features of remote sensing image, makes its precise target detection very challenging. The next article will focus on analysis and remote sensing image several important documents related to target detection, and a brief introduction to research other literature.

The first chapter is DOTA data set [1] (DOTA2018), our group intends to issue an updated version on this dataset (DOTA2019), remote sensing survey of existing target detection algorithm based on image, and the characteristics of the data collection and remote sensing images build deep learning framework, hoping to remote sensing image target detection with high accuracy.

1.DOTA: A large-scale dataset for object detection in aerial image

1.1 DOTA data set

This article describes the DOTA data sets, as well as baseline on YOLOv2, SSD, Faster RCNN and several other commonly used object detection framework.

DOTA data set is presented on CVPR2018 set a target detection data space image, we want to be like ImageNET COCO in image classification and object detection natural scenes role, through the publication of high-quality data set target detection, target promote remote sensing images the development of the field of detection, high-quality model to accelerate academic and ideological ground handling applications.

The figure is part of the data set annotation display DOTA. We can see a lot of dense small goals are marked out, mark quality of the data set is relatively high.

1.2 General Information Data Set

A total of 2806 data sets DOTA remote sensing images from a variety of different sensors and platforms to ensure the diversity of the data. The size of each image value is approximately 4000 * 4000 pixels, with targets of different dimensions, shape and orientation. These DOTA image contains 15 kinds of common objectives categories, namely, plane, ship, storage tank, baseball diamond, tennis court, basketball court, ground track field, harbor, bridge, large vehicle, small vehicle, helicopter, roundabout, soccer ball field , basketball court. Certain number of instances of data set contains a total of 188,282 number, the data set in a natural scene COCO so different, in order to avoid the position due to orientation problems not accurately portray the target, labeled DOTA data sets are used angled coordinate frame. While publishing a four-point coordinates angled frame, the government has also issued a rectangle with two points (upper left, lower right) indicated by the Coordinate Box.

We can see by the chart below most of the instance size of the pixel values ​​are less than 500, but also contains a lot of medium-sized target 500-1000 and more than 1,000 large objects, pixel size distribution points often reasonable. On the other hand, the angle of each object covers almost all angles (-π ~ π), also reflects a different image and the ordinary image sensing. In the ordinary image, trees, vehicles, people and other targets are almost perpendicular to the ground, rarely present different angles of trees, people, and space remote sensing image due to the different shooting angle, all targets are likely to appear in a variety of different point of view, to detect a big challenge.

Paper中还比较了DOTA与UCAS-AOD[2]HRSC2016[3]NWPU VHR-10[4]其他空间遥感图像的数据集,总的来看,DOTA所涉及的目标种类多,标注实例庞大,来源广泛,多样性强,是非常具有研究性和挑战性的数据集。

1.3评估方法

Paper中在horizontal bounding boxes (HBB),oriented bounding boxes (OBB)两种检测框中做目标检测,选取目前常用的深度学习检测网络,YOLOv2, R-FCN, Faster RCNN, SSD,分别预测矩形的垂直框和有方向的检测框。衡量网络检测质量的指标采用mAP,计算方法与PASCAL VOC一致。

采用DOTA的HBB数据训练模型,常见的网络训练结果如下图所示。

其中,FR-H表示Faster R-CNN trained on Horizontal bounding boxes。对比可以看出,对于YOLO或SSD这种一阶段检测网络,准确率较低,尤其是SSD对于SV(small vehicle),SP(ship)这些密集的小目标,检测效果非常差。而Faster R-CNN作为检测网络的经典网络,达到了60.46的mAP, 结果已经比较不错。

下图是在HBB上训练,在OBB上做评估的结果(FR-O是在OBB上训练),可以看到,所有的mAP都相比HBB更低,由此可见预测OBB这种更精确地检测框,确实带来了较大的困难。即使是用OBB训练的Faster RCNN,mAP也仅54.13,落后于HBB上的60.46.

由于Faster RCNN是预测矩形框作为输出的,paper中为了做实验,对其进行了简单改造,RPN输出ROI后,扩展为四个点的坐标值(x1 = x4 = xmin; x2 = x3 = xmax; y1 = y2 = ymin; y3 = y4 = ymax),再经过RCNN回归,其中真值为OBB的四个点的坐标值,target即为下图所示的公式,逐步优化使得txi,tyi降低。

 

2.R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object[5]

2.1网络介绍    

这篇论文主要是讲的旋转目标的预测方法。常见的自然场景图片很少出现旋转目标的检测任务,而在遥感图像领域,检测旋转目标是一个重要的方向。这篇文章提出了一种端到端的单阶段旋转检测器(end-to-end refined single-stage rotation detector),快速准确地定位目标。

论文中设计了feature refinement module (FRM),使用特征插值获取refined anchor,重新构建特征图来实现特征对齐的目的。论文提出的Refined Rotation Single-Stage Detector(R3Det)结构如下图所示。

R3Det是基于RetinaNet[6]进行的改进。RetinaNet采用FPN[7]作为backbone,后接分类和回归subnet,并提出了focal loss来应对类别不平衡的问题。

为了实现基于RetinaNet的旋转检测,论文用五个参数(x,y,w,h,θ)来代表任意方向的矩形。从下图a,b可以看出,预测框角度的较小变化,就会对IOU造成很大的损失。

因此,如图中(c),作者提出了重新编码预测框位置信息的方法,重新构建整个特征图,实现特征对齐。(对于如何实现具体特征对齐,作者没有给出详细的叙述,我认为是这篇文章的一个问题,这篇文章目前只在arxiv,暂未发表)为了精确计算特征信息的位置,作者采用了双线性特征插值方法。

2.2 实验

   文章做了许多实验,论证了FRM对于提升旋转框目标检测的有效性。FRM在DOTA, HRSC2016, ICDAR2015三个数据集上进行了实验。由于论文重点写了DOTA数据集,且本文比较关注DOTA数据集,因此下文重点介绍FRM在DOTA数据集上的效果。

    作者采用resnet-FPN作为backbone,并将原图切分为600×600的大小,用R3Det来训练,并与RetinaNet做比较。可以看出,R3Det对DOTA数据集的检测效果更好。(需要注意的是,下表的mAP是在DOTA2018数据集的测试结果。我们团队的大作业是在DOTA2019进行测试,因此这里mAP仅起参考作用。)

通过上表也可以看出,R3Det引入FRM后,mAP从63增加到65,并经过数据增广、更换更强大的backbone,将mAP提升到72。

与DOTA2018上的SOTA方法相比,R3Det+ResNet101, ResNet152的检测器已经超过了许多两阶段的方法,如ICN, RoI-Transformer,SCRDet等。

2.3个人点评

在我个人看来,读完这篇论文,最大的收获就是针对DOTA做backbone, data augmentation,以及了解旋转框和矩形框的一些问题,但是对于作者提出的FRM模块,根据原文的描述,这一方案并没有提升太多(63.1%到65.7%),而提升较大的反而是用新的backbone和数据增广。而且原文没有对FRM进行详细地描述,使本论文的表达不太清晰。但是无疑,这篇论文对我们后续研究DOTA数据集,做更精细的目标检测框改进,仍然提供了很大的参考价值。

 

3.Objects as Points[8]

这篇文章是anchor free的目标检测器,2019年刚发表,我们打算以此网络作为baseline,在DOTA数据集上进行验证。因此在此精讲一下这篇paper。

CenterNet这篇文章不仅是用在目标检测上,还可以用在人体姿态识别,3D目标检测等。CenterNet结构简单,正如题目所说,把目标看做点,通过预测中心点的位置来检测目标,再回归目标的宽度和高度值,不借助以往的anchor划定范围。如下图所示,红色点代表物体的中心点,模型把所有的物体都看做点,然后基于中心点回归出宽和高。

3.1 检测方法

假设输入的图像为,最终,我们要通过关键点的热力图(keypoint heatmap)来找目标的位置。为了得到热力图,从backbone网络的后面添加全卷积网络,最终得到C张热力图(C为数据集的类别,如COCO中为80)。

得到这些热力图,怎么用真值来训练呢?首先根据ground truth的坐标计算中心点坐标,,由于网络在计算过程中要进行降采样,所以最终对应的低分辨率的中心点是,其中R为下采样因子(本文为4)。然后在下采样的图像猴子那个,将ground truth中心点以的形式,用高斯核将关键点分布到特征图上。这样,特征图上有目标的位置点像素值较大,无目标的位置点像素值较小,就形成了C张热力图的真值。

3.2损失函数

3.2.1 热力图的损失函数

CenterNet的损失函数是focal loss的变形。对于容易检测出来的中心点,适当减少其训练比重,即减少其loss。不太容易检测到该中心点,就加大其loss。

3.2.2偏置损失

因为在生成热力图的过程中,对图像进行了4倍的下采样,这样特征图在映射回原始图像时会带来精度误差,因此对于每个中心点,论文都给出了一个local offset偏置项来补偿它。通过上面的公式也可以看出,这个offset只有2张feature map,也就是所有的类都共用这一个offset。这个偏置项采用L1 loss来训练。

3.2.3 目标大小的损失

假设为目标k,类别为Ck,那么它的中心点坐标为。网络使用关键点预测来预测所有的中心点。然后对每个目标k的大小进行回归,最终回归到。目标大小的feature map为

整体网络的损失函数为中心点损失、偏置损失和目标大小损失之和。每个损失赋予对应的权重。

每个坐标点都会产生C+2+2个数据,分别为每个类别的概率、长宽、偏置。

3.3 推断

训练完模型,在检测阶段,首先对一张图片下采样。然后预测出每个类在下采样的特征图,然后将图中每个类的热点提取出来。提取的方法就是检测当前点的值是否比周围8个点都大,然后取100个这样的点。

取出这样的点之后,每个热点的位置用整型坐标表示,然后使用表示当前点的confidence。然后添加偏置项和目标的大小。

3.4 总结

    这篇文章介绍的模型结构非常简单,是一种anchor free的方法,核心思想就是把目标看成点,通过检测中心点的位置来检测目标。这种思路不仅可以用于目标检测,如论文中所说,在3D检测和人体姿态识别中效果也是不错的。我认为这篇文章的思想非常值得借鉴,与以往的目标检测方法有很大的不同,非常灵活。

4. Learning RoI Transformer for Detecting Oriented Objects in Aerial Images [17]

4.1模型介绍

    在遥感图像中,有许多的密集小目标,它们朝向各不相同,与自然场景下的目标不同,因此如果用垂直的矩形检测目标,会造成非常严重的不准确。如下面的这个图可以看出,旋转的矩形可以更加精确地检测密集目标。

所以这篇论文提出了一个叫做RoI Transformer模块,致力于实现朝向不同的密集目标。它包含两个部分,第一个部分是RRoI Learner,用来将垂直的RoI(HRoI)转换成有方向的RoI(RRoI)。第二个部分是Rotated Position Sensitive RoI Align,用来从RRoI中抽取方向无关的特征,用来进一步对目标的分类和位置回归。

RoI Transformer的结构如下图所示,对于每个HRoI,它会经过一个RRoI learner。这个RRoI learner是一个全连接网络,用来回归HRoI和RGT的偏置。Box decoder是RRoI learner的末端,将HRoI和偏置作为输入,输出编码的RRoIs。然后特征图和RRoI经过RRoI wraping抽取方向无关的特征。RRoI learner和RRoI warping组成了一个RoI Transformer。

Rotated Position Sensitive RoI Align是接在RoI Transformer的结构,用来在一个网络中抽取旋转无关的特征。RPS RoI pooling将RRoI划分为K*K个块,然后输出一个K*K*C维度的特征图。

    K*K*C维度特征图的计算方式如下面公式所示

其中,Tθ的函数表达式为:

    RRoI Learner, RPS RoI Align组成了本论文的核心模块

4.2实验

在DOTA2018数据集上,论文采用RoI Transformer进行了实验。首先对原始图片切成了1024*1024大小的patch,步长为824。下表给出了不同方法在DOTA2018数据集上,预测旋转框的效果,在使用FPN的情况下,mAP达到了69.56.

检测的效果可视化结果如下。

总结来看,这篇文章提出了一个叫做RoI Transformer的结构,用来提取旋转无关的特征,以更准确地生成旋转框。在DOTA数据集上的实验也证明了这个模块的效果。

 

5.其他论文概述

    Deep Learning in Remote Sensing: A Review[9]是一篇2017年发表的综述文章,叙述了2012年起深度学习的发展及其在遥感影像中的应用。

文章从感知机出发,首先介绍了深度学习的发展历程,到多层感知机,到全连接、CNN。从2012年ImageNet引起的卷积神经网络快速发展,介绍了AlexNet[10], VGGnet[11], ResNet[12], FCN[13],这几个网络结构都代表着CNN在图像处理领域的里程碑事件。

深度学习在遥感图像领域的应用主要体现在高光谱影像分析、SAR图像解译、高分辨率卫星图像解析、3D重构等。深度学习的出现无论从性能,还是准确度,都比以往的传统遥感图像处理方法有更大的优势。因此像高分辨率卫星图像解析领域,包含目标检测、场景分类等,目前的SOTA方法都是基于深度学习的方法,如Resnet, Faster RCNN[15], FPN[16]等。

Faster RCNN(Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks)是目标检测领域的一个经典算法,至今仍在各个场景下发挥着重要的作用。Faster R-CNN是2016年提出的R-CNN系列方法,是一个经典的两阶段目标检测算法。Faster R-CNN把特征抽取、proposal生成、bounding box回归和目标的分类都整合在一个网络中,实现了端到端的训练,综合性能有了比较大的提升,尤其是在推理速度方面提升巨大。如下图的网络结构所示,首先用backbone(如Resnet50)抽取出图像的高层特征。然后送入RPN网络,生成proposal,再将生成的proposal对应到图像高层特征上,送入回归和分类的网络。在这个过程中,RPN网络发挥了重要的作用,使得生成proposal的过程不必像以往那样,采用游离于网络之外的方法,而是融进同一个网络中,实现端到端的训练。

 

 

参考文献

  1. Xia G S, Bai X, Ding J, et al. DOTA: A large-scale dataset for object detection in aerial images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 3974-3983.
  2. H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye, and J. Jiao. Orientation robust object detection in aerial images using deep convolutional neural network. In ICIP, pages 3735{3739, 2015.
  3. Z. Liu, H. Wang, L. Weng, and Y. Yang. Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds. IEEE Geosci. Remote Sensing Lett., 13(8):1074{1078, 2016.
  4. G. Cheng, P. Zhou, and J. Han. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens., 54(12):7405{7415, 2016.
  5. Yang X, Liu Q, Yan J, et al. R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object[J]. arXiv preprint arXiv:1908.05612, 2019.
  6. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
  7. T.-Y. Lin, P. Dollar, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie. Feature pyramid networks for object detection. In CVPR, volume 1, page 4, 201
  8. Zhou X, Wang D, Krähenbühl P. Objects as Points[J]. arXiv preprint arXiv:1904.07850, 2019.
  9. Zhu, Xiao Xiang et al. “Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources.” IEEE Geoscience and Remote Sensing Magazine 5.4 (2017): 8–36. Crossref. Web.
  10. A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems (NIPS),2012.
  11. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in IEEE International Conference on Learning Representation (ICLR), 2015
  12. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
  13. in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR),2016.
  14. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2015
  15. Ren, Shaoqing et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence 39.6 (2017): 1137–1149. Crossref. Web.
  16. Lin, Tsung-Yi et al. “Feature Pyramid Networks for Object Detection.” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017): n. pag. Crossref. Web.
  17. Ding J, Xue N, Long Y, et al. Learning roi transformer for detecting oriented objects in aerial images[J]. arXiv preprint arXiv:1812.00155, 2018.
发布了56 篇原创文章 · 获赞 49 · 访问量 5万+

Guess you like

Origin blog.csdn.net/IBelieve2016/article/details/103947697