New Inner-IoU loss function! ! ! Effectively improve detection results by calculating IoU through auxiliary bounding boxes

Summary

1 Introduction

2 methods

2.1 Bounding box regression mode analysis

2.2 Inner-IoU loss

3 experiments

3.1 Simulation experiment

3.2 Comparative experiment

3.2.1 YOLOv7 on PASCAL VOC

3.2.2 YOLOv5 on AI-TOD

4. Reference


Summary

With the rapid development of detectors, the bounding box regression (BBR) loss function is constantly updated and optimized. However, existing IoU-based BBR still focuses on accelerating convergence by adding new loss terms, ignoring the limitations of the IoU loss term itself. Although theoretically, IoU loss can effectively describe the state of bounding box regression, in practical applications, it cannot be adaptively adjusted according to different detectors and detection tasks, and does not have strong generalization capabilities.

Based on the above situation, the author first analyzed the BBR model and concluded that distinguishing different regression samples and using auxiliary bounding boxes of different scales to calculate the loss can effectively accelerate the bounding box regression process. For high IoU samples, using smaller auxiliary bounding boxes to calculate the loss can speed up convergence, while larger auxiliary bounding boxes are suitable for low IoU samples. Then, the author proposed the Inner-IoU loss, which calculates the IoU loss through auxiliary bounding boxes. For different datasets and detectors, the authors introduce a scaling factor ratio to control the scale of the auxiliary bounding box used to calculate the loss. Finally, Inner-IoU is integrated into the existing IoU-based loss function for simulation and comparative experiments.

Experimental results show that using the method proposed in this article further improves detection performance and verifies the effectiveness and generalization ability of Inner IoU loss.

1 Introduction

Object detection is a basic task in computer vision, including target classification and localization. The bounding box regression loss function is an important part of the detector positioning branch. The positioning accuracy of the detector depends largely on bounding box regression, which plays an irreplaceable role in current detectors.

In BBR, IoU loss can accurately describe the degree of matching between the predicted bounding box and the GT box, ensuring that the model can learn the location information of the target during the training process. As a basic part of the existing mainstream bounding box regression loss function, IoU is defined as follows:

B and B^{gt} represent prediction boxes and GT boxes respectively. After defining IoU, the corresponding loss can be defined as follows:

So far, IoU-based loss functions have gradually become mainstream and dominant. Most existing methods are based on IoU and further add new loss terms. For example, in order to solve the vanishing gradient problem when the overlap area between the Anchor box and the GT box is 0, GIoU is proposed. GIoU is defined as follows, where C is the smallest box covering B and :

Compared with GIoU, the DIoU function adds a new distance loss term based on IoU, which mainly achieves faster convergence and better performance by minimizing the normalized distance between the center points of the two bounding boxes. It is represented as follows:

where b and \rho are the center points of B and B^{gt} respectively, referring to the Euclidean distance, and c is the diagonal of the minimum bounding box.

CIoU further considers shape loss and adds a shape loss term based on DIoU loss. It is represented as follows:

Where,\alpha is the orthogonal balance parameter:

Among them,\upsilon measures the consistency of aspect ratio:

\omega ^{gt} and h^{gt} represent the width and height of the target box respectively, w and h represent the width and height of the prediction box respectively. When the aspect ratio of the target box and the prediction box are the same, CIoU will degenerate into DIoU.

Compared with DIoU, EIoU directly minimizes the normalized difference between the width and height of the target box and the anchor box, as well as the center position. EIoU is defined as follows:

w^{c} and h^{c} are the width and height of the minimum bounding box covering the target box and prediction box respectively.

The recent SIoU introduces angle loss into the bounding box regression loss function based on considering the impact of the angle between the Anchor box and the GT box on the bounding box regression. It is defined as follows:

The angle loss represents the minimum angle between the center point connection of the GT box and the Anchor box:

This item is designed to move the Anchor box to the nearest coordinate axis, giving priority to the X-axis or Y-axis based on the angle change. When the angle value is 45°, Λ=1. When the center point is aligned along the X-axis or Y-axis, Λ=0.

After considering the angle cost, the distance loss is redefined as follows:

Shape loss mainly describes the size difference between the GT box and the Anchor box, which is defined as follows:

The value of θ determines the importance of shape cost. This parameter ranges from 2 to 6.

Although the above bounding box regression loss functions can accelerate convergence and improve detection performance by adding new geometric constraints to the IoU loss function, they do not consider the rationality of the IoU loss itself, which determines the quality of the detection results. To make up for this shortcoming, the author proposes the Inner-IoU loss to speed up regression by using auxiliary bounding boxes without adding any new loss terms.

The main contributions of this article are as follows:

  • The author analyzed the process and mode of bounding box regression, and based on the inherent characteristics of the bounding box regression problem, proposed using a smaller auxiliary bounding box to calculate the loss during the model training process to produce a positive effect on the regression of high IoU samples, while Low IoU samples have the opposite effect.

  • The author proposed the Inner-IoU loss by introducing scale factor control to generate auxiliary bounding boxes of different scales to calculate the loss. Applying this to existing IoU based loss functions can achieve faster and more efficient regression results.

  • The author conducted a series of simulation and comparison experiments, and the experimental results show that the author's method is superior to existing methods in terms of detection performance and generalization, and it achieves SOTA for data sets of different pixel sizes.

2 methods

2.1 Bounding box regression mode analysis

The IoU loss function is widely used in computer vision tasks. During the bounding box regression process, not only can the regression status be evaluated, but the convergence can also be accelerated by calculating the regression loss. Here, the author discusses the relationship between IoU changes and bounding box size, analyzes the essential characteristics of the bounding box regression problem, and explains the feasibility of the method proposed in this article.

As shown in Figure 3, Figure 3a shows the IoU bias curve, with the horizontal and vertical axes representing the bias and IoU value respectively. The curves of three different colors correspond to the IoU change curves of bounding boxes of different scales. A, B, C, D and E respectively represent 5 different position relationships between the Anchor box and the GT box. The red border box represents the Anchor box with a length and width of 10, and the corresponding GT box is represented by a black border box.

Figure 3b shows the ABS (Grad) deviation curve. Unlike Figure 3a, the vertical axis of Figure 3b represents the absolute value of the IoU deviation. The authors assume that the actual bounding box size is 10 and use bounding boxes of size 8 and 12 as auxiliary bounding boxes. In Figure 3, A and E correspond to the regression status of low IoU samples, while B and D correspond to the regression status of high IoU samples. The following conclusions can be drawn from Figure 3:

  • Due to the scale difference between the auxiliary bounding box and the actual bounding box, the changing trend of the IoU value during the regression process is consistent with the changing trend of the actual bounding box IoU value, which can reflect the quality of the actual bounding box regression results.
  • For high IoU samples, the absolute value of the IoU bias of the smaller-scale auxiliary bounding box is larger than the absolute value of the IoU bias of the actual bounding box.
  • For low IoU samples, the absolute value of the IoU bias of the larger-scale auxiliary bounding box is greater than the absolute value of the IoU bias of the actual bounding box.

Based on the above analysis, using smaller-scale auxiliary bounding boxes to calculate IoU loss can help improve the regression speed of high-IoU samples and accelerate convergence. On the contrary, using larger scale auxiliary bounding boxes to calculate the IoU loss can speed up the regression process of low IoU samples.

2.2 Inner-IoU loss

In order to make up for the shortcomings of the existing IoU loss function's weak generalization ability and slow convergence speed in different detection tasks, the author proposed to use auxiliary bounding boxes to calculate the loss to accelerate the bounding box regression process. In Inner-IoU, the author introduces the scale factor ratio, which can control the scale of the auxiliary bounding box. By using auxiliary bounding boxes of different scales for different datasets and detectors, the limitations of existing methods in generalization capabilities can be overcome.

The GT box and the Anchor box are represented asB^{gt} and B respectively, as shown in Figure 1. The center point of the GT box and the center point of the inner GT box are represented by (x_{c}^{g^{t}},y_{c}^{g^{t}}), and (x_{c},y_{c}) represents the center point of the Anchor box and the inner Anchor box. The width and height of the GT box are represented by \omega ^{gt} and h^{gt} respectively, while the width and height of the Anchor box are represented by w and h. The variable "ratio" corresponds to the scaling factor, usually in the range [0.5, 1.5].

Inner-IoU loss inherits some characteristics of IoU loss and has its own characteristics. The range of Inner-IoU loss is the same as IoU loss, which is [0,1]. Since there is only a scale difference between the auxiliary bounding box and the actual bounding box, the loss function is calculated in the same way, and the Inner-IoU bias curve is similar to the IoU bias curve.

Compared with the IoU loss, when the ratio is less than 1 and the auxiliary bounding box size is smaller than the actual bounding box, the effective range of the regression is smaller than the IoU loss, but the absolute value of the gradient is larger than the gradient obtained from the IoU loss, which can accelerate the convergence of high IoU samples. On the contrary, when the ratio is greater than 1, the larger-scale auxiliary bounding box expands the effective range of regression and has an enhanced effect on the regression of low IoU samples.

Apply Inner-IoU loss to existing IoU-based bounding box regression loss functions, such as L_{Inner-IoU}, L_{Inner-GIoU}, < a i=3>, , , and , as follows: L_{Inner-DIoU}L_{Inner-CIoU}L_{Inner-EIoU}L_{Inner-SIoU}

3 experiments

3.1 Simulation experiment

As shown in Figure 5, this paper analyzes the bounding box regression process in two different scenarios through simulation experiments. In Figure 5a and Figure 5b, 7 different green bounding boxes are set as target boxes. The center point of the target box is set to (100, 100), and the ratios are 1:4, 1:3, 1:2, and 1 respectively. :1, 2:1, 3:1 and 4:1. In Figure 5a, the Anchor box randomly assigns 2000 points, and its position distribution is centered on (100, 100) and has a radius of 3. For each point scale, the area of ​​the Anchor box is set to 0.5, 0.67, 0.75, 1, 1.33, 1.5 and 2.

For a given point and scale, adapt to 7 aspect ratios, i.e. follow the same target settings as the target box (i.e. 1:4, 1:3, 1:2, 1:1, 2:1, 3:1 and 4 :1). The Anchor box distribution in Figure 5b is different from Figure 5a. Its position distribution is centered at (100, 100) and has a radius of 6 to 9. Dimensions and proportions are the same as in Figure 5a. In summary, in each experiment, 2,000 × 7 × 7 anchor boxes should be fitted to each target box.

Therefore, there are a total of 686,000 = 7 × 7 × 7 × 2,000 compression cases. The results of the simulation experiment are shown in Figure 7, where Figure 7a shows the convergence results in the high IoU regression sample scenario. In order to accelerate the regression of high IoU samples, the scale factor ratio is set to 0.8. Figure 7b shows the convergence results in the low IoU regression sample scenario, with the ratio set to 1.2. It can be seen that the dotted line in the figure indicates that the convergence speed of the author's method is better than that of existing methods.

3.2 Comparative experiment

3.2.1 YOLOv7 on PASCAL VOC

This experiment compared the CIoU method and the SIoU method, using YOLOV7-tiny as the detector, VOC2007 trainval and VOC2012 trainval as the training set, and VOC2007 test as the test set. The training set contains 16551 images, while the test set contains 4952 images with 20 categories. The author trained 150 Epochs on the training set to demonstrate the advantages of the author's method.

The author visualized the training process of the proposed method and the original method, as shown in Figure 8. Figure 8a, Figure 8b and Figure 8c show the training process curves of CIoU and Inner-CIoU respectively, and the corresponding ratios are 0.7, 0.75 and 0.8 respectively. Figure 8d, Figure 8e and Figure 8f are the training process curves of SIoU and Inner-SIoU when the ratio is 0.7, 0.75 and 0.8 respectively.

In Figure 8, the orange curve represents the method proposed in this paper, while the existing methods are represented by the green curve. It can be seen that the method proposed in this article outperforms existing methods during the training process of 50 to 150 Epochs.

The results of the comparative experiment on the test set are shown in Table 1. It can be seen that after applying the method in this article, the detection effect has been improved, and AP50 and mAP50:95 have both increased by more than 0.5%.

Figures 2 and 6 show the comparison of the tested samples. As can be seen from the figure, compared with existing methods, the proposed method has more accurate positioning and fewer false detections and missed detections.

3.2.2 YOLOv5 on AI-TOD

In order to prove the generalizability of the proposed method, the authors conducted comparative experiments on the AI-TOD dataset, using SIoU as the comparison method.

AI-TOD includes 28,036 aerial images, 8 target types, and 700,621 target instances, of which 14,018 images serve as the training set and the remaining 14,018 images serve as the test set. Compared with existing target detection task datasets, the average size of AITOD is 12.8 pixels, which is much smaller than other datasets. The experimental results are shown in Table II.

In Comparative Experiment 1, by setting the scale value between 0.7 and 0.8 to be less than 1, an auxiliary bounding box that is smaller than the actual bounding box is generated. Experimental results show that it can improve the yield of high IoU samples. In Experiment 2, when the scale value is greater than 1, the convergence of low IoU samples is accelerated by generating larger auxiliary bounding boxes.

In addition, Figure 4 shows the comparison of detection results on the test set, through which the advantages of the proposed method can be seen.

4. Reference

[1]. Inner-IoU: More Effective Intersection over Union Loss with Auxiliary Bounding Box.

Guess you like

Origin blog.csdn.net/qq_40716944/article/details/134326582