NeurIPS 2023 | Southeast University & Shanghai Subcommittee proposed H2RBox-v2: a new network for rotating target detection

Click on the card below to follow the " CVer " public account

AI/CV heavy-duty information, delivered as soon as possible

Click to enter -> [Remote Sensing Imagery and Target Detection] Communication Group

Author: Huo Muyu (Source: Zhihu, authorized) | Editor: CVer public account

https://zhuanlan.zhihu.com/p/620884206

Reply in the background of CVer WeChat public account: H2R, you can download the pdf and code of this paper

I. Introduction

H2RBox-v2 has been released. As a new version of H2RBox, this article proposes Symmetry-aware Learning, that is, from the perspective of symmetry self-supervised learning goals, the performance is significantly improved compared to H2RBox.

The research topic of this article is: Learning directed target detection from horizontal box annotation (weakly supervised learning), that is, when manual annotation only has horizontal boxes and no rotation angles, whether the neural network can learn the rotation of each target angle.

13997fee18958c9510bb4d30581f950b.png

H2RBox-v2: Incorporating Symmetry for Boosting Horizontal Box Supervised Oriented Object Detection paper: https://arxiv.org/abs/2304.04403 Code (open source): https://github.com/open-mmlab/mmrotate

A comparison was made on several existing solutions, including methods based on SAM (Segment Anything Model):

8c5226319007d56de122a943e91eca4f.jpeg

2. Inspiration

Symmetry is a natural attribute that widely exists in various scenes. Taking the DOTA data set as an example, many categories such as airplanes, stadiums, cars, and ships have significant symmetry. In the process of manual directed annotation of images, the symmetry of the target is also one of the important considerations. People usually naturally regard the direction of the symmetry axis as the target orientation, so symmetry learning is used in horizontal box weak supervision. The direction of the goal is also theoretically feasible.

Principle analysis

The definition of axial symmetry: If a plane figure can overlap with the original figure after being flipped along a straight line, then the figure is an axially symmetric figure, and the straight line is the axis of symmetry.

5e8ced927d9270a6df30f52ff6581b05.png

c33dc432d151e8ea886f7c4c1f718d03.jpeg

5af742e1e4e2d26826903e85a4777be0.png

So the core theory driving H2RBox-v2 was born: If the network inputs an axially symmetric image as an incentive, training the network to meet flip consistency and rotation consistency is equivalent to using the symmetry of the input image to supervise the network. If the network successfully learns flip consistency and rotation consistency, then the output of the network will be exactly the direction of the symmetry axis of the image.

Although the above conclusion is deduced when the input image has only a single target and is completely symmetrical, experiments show that even for multi-target detection, including when the target is not perfectly symmetrical, the direction approximate to the symmetry axis can be learned. In other words, this principle is actually applicable to almost all long-shaped targets, and it can be said to be a perfect fit with the application scenario of directional target detection.

By the way, in order to make it easier to understand, the periodicity of rotation is not added here. The rigorous conclusion is that there are two solutions, either in the direction of the symmetry axis or perpendicular to the direction of the symmetry axis. This actually does not affect the training of the network. When the self-supervised branch learns the direction perpendicular to the symmetry axis, the weakly supervised branch will automatically swap the long and short sides, and the final output is always correct.

3. Method introduction

Let me briefly mention the old version first. H2RBox-v1 includes a weak supervision branch and a self-supervised branch. The size and angle of the target are mainly learned by the weak supervision branch. The essence is to learn the angle through the external geometric constraints of the horizontal box. There are several symmetric solutions to the angle, and the erroneous solutions are eliminated by relying on the angle constraints provided by the self-supervision branch. This method has higher requirements for the labeling accuracy of horizontal boxes and a larger amount of training data.

H2RBox-v2 also includes a weak supervision branch and a self-supervision branch. The weak supervision branch is slightly improved on the basis of H2RBox-v1, while the self-supervision branch adopts a new Symmetry-aware Learning paradigm and can work independently from the weak supervision branch. , learn the direction of the target directly from the image based on the symmetry of the target.

Overall structure diagram:

9fe9e9803278e72feb8672cdbf3fd7b6.png

self-supervised branch

The self-supervised branch of H2RBox-v2 first generates two views of the original image, 1) flipping it up and down, and 2) randomly rotating it. In the self-supervised branch, the network only predicts one angle value (first outputs a vector of length 3 for each pixel, and then obtains the angle through PSC angle encoding).

This angle encoding method is derived from:

https://zhuanlan.zhihu.com/p/620775646

9c05d227df03a424319240129247464f.png

weakly supervised branch

There are two core improvements in the weak supervision branch of H2RBox-v2:

1) Because the self-supervision of H2RBox-v2 can work independently, there is no need to learn from the perspective of weak supervision, so the weak supervision does not need to pass any information to the Subnet of angle regression, and detach is used to disconnect the gradient in the code.

2) In the weak supervision branch of H2RBox-v1, the rotated frame must be converted into a horizontal frame, and then the IoU Loss is calculated. However, once random rotation data enhancement is used, the label box is no longer a horizontal box, which makes H2RBox-v1 unable to use random rotation data enhancement. In order to solve this problem, H2RBox-v2 proposes a CircumIoU Loss, which directly calculates the IoU loss between the two boxes of the external relationship:

f81eac0f0c06922f3fd4da5283dd2f3d.jpeg

The final Loss just needs to directly add the weak supervision branch and the self-supervision branch.

reasoning process

The first two branches are used to calculate Loss during training, and do not need to generate perspectives during inference. Since different branches share parameters, only one inference (Backbone+Angle/Regression/Classification/Centerness Heads) is required. Compared with the original detector or H2RBox-v1, the inference process only requires one more PSC decoding, so the inference speed is almost the same.

4. Experiment

ablation experiment

Table 3-4: First, the new Loss proposed in self-supervision and weak supervision is verified. It can be seen that both PSC encoder and Snap Loss (ls column in the table) are necessary, otherwise the training will be very inefficient due to boundary problems. Stable, and CircumIoU Loss (lp column in the table) has indeed solved the problem that H2RBox-v1 cannot be rotated and enhanced. The reason why the performance of DOTA has not improved after rotation enhancement is probably because the 1x training volume used in the comparison is not enough for rotation enhancement.

8c0450fccdd3f15764bd96a93596c288.jpeg

Table 5: Let’s look at the weight between flipping and rotation. The weight of flipping is smaller than that of rotation. 0.05 or 0.1 is better. If it is larger, it will not converge.

06e9b67fe86cb288b9c4ba34e6a99144.jpeg

e1d0d83aa2baf2612ed82a195d4c7e63.png

Table 7: Finally, there is a Padding experiment. Padding is very important in H2RBox-v1. Black edges cannot be left and must be filled with reflection. However, in H2RBox-v2, the impact of black edges is very small, only a little lower (it cannot be ruled out that it is a random factor, or it may actually have no impact).

98de9d4eae391d1189a8bd8762d55aa9.jpeg

Comparative Experiment

H2RBox-v2 has been experimented on the three versions of DOTA v1.0/1.5/2.0, as well as the HRSC and FAIR1M data sets. Among them, many experiments have been done on DOTA-v1.0, and the improvement of H2RBox-v2 compared to v1 is 2.26%. The v1 compared here uses the new code of MMRotate1.0 and the reproducible results after re-adjusting parameters such as learning rate. The benchmark itself is 2 points higher than the original v1 paper.

bb5184edc24ae1622a7fdfac06a74217.png

On DOTA-v1.5/2.0, there is a similar improvement. The average increase of the three versions of DOTA is 2.32%.

HRSC because the data set is relatively small, H2RBox-v1 basically cannot be trained, and H2RBox-v2 works equally well on such small data sets. Let me mention KCR (CVPR 2023) here. This method is transfer learning. It is trained with DOTA marked with directed boxes and transferred to HRSC marked with horizontal boxes. It has reached 79.10%, which is already the current SOTA. The HRSC of H2RBox-v2, which simply uses horizontal box annotation, reaches 89.66%.

Finally, let’s talk about FAIR1M. Compared with DOTA, this data set contains more targets with very significant symmetry such as airplanes, cars, and stadiums. Therefore, H2RBox-v2 performs very well on this data set, improving by 6.33% compared to v1. , even 1.02% higher than the fully supervised FCOS (the detector of H2RBox-v2 is also based on FCOS, so the comparison between the two can illustrate the gap between weak supervision and full supervision).

d0b1bd72908088ce5ece8a972fd339a0.jpeg

5. Summary

The main innovation of H2RBox-v2 is symmetry self-supervised learning, which directly obtains direction information from the symmetry of the target. It has achieved very good results in categories such as Plane, Ship, and Vehicle, but there are also some categories that are not applicable, such as Baseball-diamond. If some of the categories are learned according to the v2 method, while other targets with poor symmetry still retain the bounded rectangular geometric constraint method of v1, it is possible to obtain higher performance.

Reply in the background of CVer WeChat public account: H2R, you can download the pdf and code of this paper

Click to enter -> [Remote Sensing Imagery and Target Detection] Communication Group

ICCV/CVPR 2023 paper and code download

 
  

Backstage reply: CVPR2023, you can download the collection of CVPR 2023 papers and code open source papers

后台回复:ICCV2023,即可下载ICCV 2023论文和代码开源的论文合集
遥感图像和目标检测交流群成立
扫描下方二维码,或者添加微信:CVer333,即可添加CVer小助手微信,便可申请加入CVer-遥感图像或者目标检测 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer、NeRF等。
一定要备注:研究方向+地点+学校/公司+昵称(如遥感图像或者目标检测+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群

▲扫码或加微信号: CVer333,进交流群
CVer计算机视觉(知识星球)来了!想要了解最新最快最好的CV/DL/AI论文速递、优质实战项目、AI行业前沿、从入门到精通学习教程等资料,欢迎扫描下方二维码,加入CVer计算机视觉,已汇集数千人!

▲扫码进星球
▲点击上方卡片,关注CVer公众号
整理不易,请点赞和在看

Guess you like

Origin blog.csdn.net/amusi1994/article/details/133398377