Author|Fengwen, BBuf
Bounding box regression is a key step in object detection. In existing methods, although it is widely used for bounding box regression, it is not tailor-made for the evaluation metric, namely Intersection over Union (IoU).
Recently, IoU loss and generalized IoU (GIoU) Loss have been proposed as metrics for evaluating IoU, but there are still problems of slow convergence and inaccurate regression. In this paper, we propose Distance-IoU (DIoU) Loss by combining the normalized distance between predicted and target boxes, which converges much faster than IoU and GIoU Loss in training.
In addition, this paper summarizes three geometric factors in bounding box regression, namely, overlap area, central point distance and aspect ratio, and on this basis, a complete IoU (CIoU ) loss, which promotes faster convergence and better performance.
By combining DIoU and CIoU losses into state-of-the-art object detection algorithms, such as YOLO v3, SSD and Faster RCNN, we obtain significant performance gains not only in terms of IoU metrics but also in terms of GIoU metrics. Furthermore, DIoU can be easily used with Non-Maximum Suppression (NMS) as a criterion, further boosting the performance. (Note: The IoU indicator and GIoU indicator here refer to some aspects such as target detection accuracy measurement (mAP value), IoU loss calculation stability, etc.)
Object detection is one of the key problems in computer vision tasks and has received extensive research attention for decades (Redmon et al. 2016; Redmon and Farhadi 2018; Ren et al. 2015; He et al. 2017; Yang et al. . 2018; Wang et al. 2019; 2018). Generally, existing object detection methods can be divided into:
-
single-stage-detection, such as the YOLO family (Redmon et al. 2016; Redmon and Farhadi 2017; 2018) and SSD (Liu et al. 2016; Fu et al. 2017),
-
Two-stage detection, such as R-CNN series detection (Girshick et al. 2014; Girshick 2015; Ren et al. 2015; He et al. 2017),
Even multi-stage detection, like Cascade R-CNN (Cai and Vasconcelos 2018). Despite these different detection frameworks, bounding box regression predicting a rectangular box to locate the target object is still a key step.
Code warehouse address:
https://github.com/Oneflow-Inc/one-yolov5
1
foreword
This article is mainly based on the paper Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression ( https://arxiv.org/pdf/1911.08287.pdf ) to analyze and learn IoU.
IoU
Introduction to IoU
Intersection over Union (IoU)
IoU has been introduced in the section of indicator evaluation overview, and I have already had a preliminary understanding of IoU (in fact, it is not simply used in the yolov5 project, but CIoU introduced later)
Calculation formula:
-
is the real regression box (gt:ground-truth),
-
is the predicted regression box.
IoU loss
Calculation formula:
Analysis of the advantages and disadvantages of IoU Loss
There are obvious flaws that IoU loss can only work when the bounding boxes overlap, and will not provide any moving gradient for non-overlapping cases (movement means that the predicted box moves towards the direction where the target box overlaps). The moving gradient indicates that it is impossible to measure the loss generated by two completely disjoint boxes (iou is fixed to 0), and two prediction boxes of different shapes may produce the same loss (same iou) as shown on the left and right of the figure below Show.
GIoU
Introduction to GIoU
The original intention of GIoU design is to solve the problem of IoU Loss (iou is always 0 when the prediction frame and the real frame do not intersect), and a set of Generalized Intersection over Union Loss is designed. On the basis of IoU, GIoU also needs to find the minimum circumscribed rectangle of the predicted frame and the real frame, and then find the minimum circumscribed rectangle minus the area of the union of the two predicted frames. The specific algorithm flow is as follows:
GIoU loss
Calculation formula:
Where C is the smallest box covering the sum . Due to the introduction of C, the prediction box will also move towards the target box without overlapping.
GIoU Advantages and Disadvantages Analysis
GIoU Loss solves the problem of IoU Loss in disjoint situations. It can be used as an appropriate substitute for IoU in all performance indicators, and can get higher accuracy in target detection tasks.
Disadvantages: Although GIoU can alleviate the vanishing gradient problem in overlapping cases, it still has some limitations. That is, it is impossible to measure the frame regression loss when there is an inclusion relationship. As shown in the figure below, the three regression frames have the same GIoU Loss, but obviously the regression effect of the third frame is better.
IoU & GIoU analysis
First, in the previous part of this paper we analyzed the limitations of the original IoU loss and GIoU loss. The process of bounding box regression will be further analyzed through the simulation experiment results.
(Supplementary explanation: Why do model experiments? Because it is difficult to analyze the process of bounding box regression only from the detection results, because the regression in uncontrolled benchmarks is often not comprehensive, such as: different distances, different scales and different aspect ratios. Instead, simulate experiments where regression is considered comprehensively and then the problem for a given loss function can be easily analyzed.)
simulation experiment
In simulation experiments, we try to cover most of the relationships between bounding boxes by distances, scales and aspect ratios, as shown in Figure 3(a). In particular, we choose 7 unit boxes (i.e. each box has an area of 1), with different aspect ratios (i.e. 1:4, 1:3, 1:2, 1:1, 2:1, 3: 1 and 4:1 ) as the target box. Without loss of generality, the center points of the 7 object boxes are fixed at (10,10) . The anchor boxes are evenly distributed over 5000 points.
1. Distance: In a circular area centered on (10, 10) with a radius of 3, select 5000 points evenly, and place anchor boxes with 7 scales and 7 aspect ratios. In these cases, both overlapping and non-overlapping boxes are included.
2. Scale: For each point, the area of the anchor box is set to 0.5, 0.67, 0.75, 1, 1.33, 1.5 and 2, respectively.
3. Aspect Ratio: For a given point and scale, 7 aspect ratios are used, i.e. follow the same settings as the target box (i.e. 1:4, 1:3, 1:2, 1:1, 2:1 , 3:1 and 4:1 ). All anchor boxes correspond to each target box. In summary, there are a total of regression cases.
Figure 3: Simulation experiment: (a) 1.715 million regression cases are adopted by considering different distances, scales and aspect ratios. (b) Regression error sum (i.e.: ) Curves of different loss .
Then by giving the loss function we can use the gradient descent algorithm to simulate the bounding box regression process in each case . For the predicted box, the current prediction can be passed:
where is the prediction box at the time of iteration, representing the gradient of the loss. The feeling can be understood as the learning rate. It is worth noting that in our implementation, the gradients are multiplied by to speed up the convergence. The performance evaluation of the bounding box regression is performed by using the simulation experiment for each loss function. When the iteration is reached , the error curve is shown in the figure.
Limitations on IoU and GIoU loss
In Figure 4, we visualize the final regression error for 5000 scattered points at iteration T.
-
It is easy to see from Fig. 4(a) that the IoU loss is only applicable to the case of overlapping with the object box. Since ∇B is always 0, anchor boxes that do not overlap will not move. See equation (3) by adding a penalty term,
-
GIoU loss can better alleviate the problem of non-overlapping cases, as shown in Figure 4(b), but GIoU loss significantly expands the basin, i.e., the working area of GIoU. However, in the case of horizontal and vertical orientations, there is still a high possibility of large errors. This is because the penalty term in the GIoU loss is used to minimize |C−A∪B|, but the area of C−A∪B is usually small or 0 (when two boxes have a containment relationship), and then GIoU is almost degenerates into IoU loss. Running enough iterations of the GIoU loss at an appropriate learning rate converges to a good solution, but very slowly. Geometrically speaking, from the regression step shown in Figure 1, GIoU actually increases the size of the predicted frame to overlap with the target frame, and then the IoU item is used to match the predicted frame with the target frame, resulting in very slow of convergence.
To sum up, in the non-overlapping case, IoU loss converges poorly, while GIoU loss converges slowly, especially for horizontal and vertical boxes. In the object detection process, neither IoU nor GIoU loss can guarantee the accuracy of regression.
DIoU & CIoU
Through the previous analysis of IoU and GIoU, we will naturally ask the following questions:
-
First, is it possible to directly minimize the normalized distance between predicted and target boxes to achieve faster convergence?
-
Second, how to make the regression more accurate and faster when there is overlap or even inclusion with the target box?
DIoU loss
Distance-IoU loss: faster and better bounding box regression loss, in general, IoU-based loss can be defined as:
where is the penalty term for the prediction box B and the target box . By designing an appropriate penalty term, in this section, we propose DIoU loss and CIoU loss to answer the above two questions.
To answer the first question, we propose to minimize the normalized distance between the center points of two bounding boxes, the penalty term can be defined as:
where and represent the center points of B and , respectively. is the Euclidean distance, and C is the diagonal length of the smallest closed box covering two boxes. The DIoU loss function can be defined as:
As shown in Figure 5, the penalty term of DIoU loss directly minimizes the distance between two center points, while the purpose of GIoU loss is to reduce the area.
DIoU and IoU/GIoU loss comparison
The newly proposed DIoU loss inherits some properties of IoU and GIoU losses:
-
DIoU loss is still scale-invariant to regression problems
-
Similar to GIoU loss, DIoU loss can provide bounding boxes with moving directions when they do not overlap with object boxes.
-
When the two bounding boxes match perfectly, when both boxes are far away,
DIoU loss has several advantages over IoU loss and GIoU loss, which can be evaluated by simulation experiments.
-
As shown in Figure 1 and Figure 3, DIoU loss can directly minimize the distance between two boxes, so the convergence speed is much faster than GIoU loss.
-
For the case where two boxes are contained (Fig. 2), or in the case of horizontal and vertical directions (Fig. 6), DIoU loss can regress very quickly, while GIoU loss almost degenerates into IoU loss, namely
Complete IoU Loss
We then answer the second question, proposing that a good loss for bounding box regression should take into account three important geometric factors, namely overlapping area, center point distance, and aspect ratio . Overlapping regions are considered by uniform coordinate IoU loss, while GIoU loss relies heavily on IoU loss. Our proposed DIoU loss aims to simultaneously consider the overlapping area and center point distance of bounding boxes. However, the consistency of the aspect ratio of the bounding box is also an important geometric factor. Therefore, based on the DIoU loss, the CIoU loss is proposed by adding the consistency of the aspect ratio:
where is a positive trade-off parameter that measures the consistency of the aspect ratio.
Then the loss function can be defined as:
Higher priority regression is given by the overlap area factor, especially for non-overlapping cases. The optimization of the final DIoU loss is the same as that of the CIoU loss, except that the gradient of W and h should be specified.
The dominant factor is usually a small value for the sum in the range [0,1], which is likely to produce exploding gradients. Therefore in our implementation, the dominant controller is removed, the step size is replaced by 1, and the gradient direction is still consistent with Equation (12).
NMS(Non-Maximum Suppression)
2
introduce
NMS is the last step of most object detection algorithms, in which redundant detection boxes are removed when their overlap with the highest sub-frame exceeds a threshold. Soft-NMS (Bodla et al. 2017) penalizes the detection score IoU of neighboring boxes with a continuous function w.r.t., producing softer large and stronger suppression than the original NMS.
IoU-Net (Jiang et al. 2018) proposes a new network branch to predict location reliability to guide NMS. Recently, adaptive NMS (Liu, Huang, and Wang 2019) and Softer-NMS (He et al. 2019) have been proposed to study appropriate thresholding strategies and weighted averaging strategies, respectively. In this work, DIoU is simply used as the standard of the original NMS, and when suppressing redundant boxes, the overlapping area of the bounding boxes and the distance between two center points are considered at the same time.
DioU-NMS
Non-Maximum Suppression using DIoU
In the original NMS, the IoU index is used to suppress Luyu's detection frame, where the overlapping area is the only factor, which often produces false suppression for occluded cases. We propose in this work that DIoU is a better criterion for NMS because not only overlapping should be considered in the suppression criterion—DIoU-NMS is formally defined as:
where the box Bi is removed by simultaneously considering the IoU and the distance between the center points of the two boxes. Si is the classification score and is the NMS threshold. We think that two boxes whose center points are far away may locate different objects and should not be removed. In addition, DIoU-NMS is very flexible and can be integrated into any object detection pipeline with just a few lines of code.
3
summary
This article mainly introduces DIoU loss and CIoU loss for bounding box regression, and DIoU-NMS for suppressing redundant detection boxes. By directly minimizing the normalized distance of two center points, DIoU loss can achieve faster convergence than GIoU loss. In addition, the CIoU loss considers three geometric properties (i.e. overlapping area, center point distance and aspect ratio), which promotes faster convergence and better performance.
reference article
-
https://github.com/Zzh-tju/DIoU/blob/master/README.md#introduction
-
https://github.com/Zzh-tju/DIoU/blob/master/README.md#introduction
-
IoU: https://arxiv.org/pdf/1608.01471.pdf
-
GIoU: https://giou.stanford.edu/GIoU.pdf
-
DIoU: https://arxiv.org/pdf/1911.08287.pdf
everyone else is watching
-
Li Bai: Your model weight is very good, but unfortunately I confiscated it
-
Faster than fast, open source Stable Diffusion refreshes the drawing speed
-
OneEmbedding: Training a TB-level recommendation model with a single card is not a dream
-
GLM training acceleration: up to 3 times performance improvement, 1/3 memory saving
-
"Zero" code changes, static compilation doubles the reasoning speed of Taiyi Stable Diffusion