YOLOv5 comprehensive analysis tutorial ③: Faster and better bounding box regression loss

ff9f71d9ff7fda62370876744c6b7ddc.jpeg

Author|Fengwen, BBuf

Bounding box regression is a key step in object detection. In existing methods, although cf9daaf8e87d231073b6c0e2a8874254.pngit is widely used for bounding box regression, it is not tailor-made for the evaluation metric, namely Intersection over Union (IoU).

Recently, IoU loss and generalized IoU (GIoU) ​​Loss have been proposed as metrics for evaluating IoU, but there are still problems of slow convergence and inaccurate regression. In this paper, we propose Distance-IoU (DIoU) Loss by combining the normalized distance between predicted and target boxes, which converges much faster than IoU and GIoU Loss in training.

In addition, this paper summarizes three geometric factors in bounding box regression, namely, overlap area, central point distance and aspect ratio, and on this basis, a complete IoU (CIoU ) loss, which promotes faster convergence and better performance.

By combining DIoU and CIoU losses into state-of-the-art object detection algorithms, such as YOLO v3, SSD and Faster RCNN, we obtain significant performance gains not only in terms of IoU metrics but also in terms of GIoU metrics. Furthermore, DIoU can be easily used with Non-Maximum Suppression (NMS) as a criterion, further boosting the performance. (Note: The IoU indicator and GIoU indicator here refer to some aspects such as target detection accuracy measurement (mAP value), IoU loss calculation stability, etc.)

Object detection is one of the key problems in computer vision tasks and has received extensive research attention for decades (Redmon et al. 2016; Redmon and Farhadi 2018; Ren et al. 2015; He et al. 2017; Yang et al. . 2018; Wang et al. 2019; 2018). Generally, existing object detection methods can be divided into:

  • single-stage-detection, such as the YOLO family (Redmon et al. 2016; Redmon and Farhadi 2017; 2018) and SSD (Liu et al. 2016; Fu et al. 2017),

  • Two-stage detection, such as R-CNN series detection (Girshick et al. 2014; Girshick 2015; Ren et al. 2015; He et al. 2017),

    Even multi-stage detection, like Cascade R-CNN (Cai and Vasconcelos 2018). Despite these different detection frameworks, bounding box regression predicting a rectangular box to locate the target object is still a key step.

Code warehouse address:

https://github.com/Oneflow-Inc/one-yolov5
 

1 

foreword

This article is mainly based on the paper Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression ( https://arxiv.org/pdf/1911.08287.pdf ) to analyze and learn IoU.

IoU

Introduction to IoU

Intersection over Union (IoU)

IoU has been introduced in the section of indicator evaluation overview, and I have already had a preliminary understanding of IoU (in fact, it is not simply used in the yolov5 project, but CIoU introduced later)

Calculation formula:

913a461d3a9c02c90f226b15f453d6e9.png

  • 4c4cc23dbc4e0fa4436f973de8611b5e.pngis the real regression box (gt:ground-truth),

  • e107c3f622875be68da4c2a78d30dee6.pngis the predicted regression box.

IoU loss

Calculation formula:

8736248845b15c0548ec7195926ef03f.png

Analysis of the advantages and disadvantages of IoU Loss

There are obvious flaws that IoU loss can only work when the bounding boxes overlap, and will not provide any moving gradient for non-overlapping cases (movement means that the predicted box moves towards the direction where the target box overlaps). The moving gradient indicates that it is impossible to measure the loss generated by two completely disjoint boxes (iou is fixed to 0), and two prediction boxes of different shapes may produce the same loss (same iou) as shown on the left and right of the figure below Show.

e0b04a24fd00e748e7e5dc1e4b758184.png

GIoU

Introduction to GIoU

The original intention of GIoU design is to solve the problem of IoU Loss (iou is always 0 when the prediction frame and the real frame do not intersect), and a set of Generalized Intersection over Union Loss is designed. On the basis of IoU, GIoU also needs to find the minimum circumscribed rectangle of the predicted frame and the real frame, and then find the minimum circumscribed rectangle minus the area of ​​the union of the two predicted frames. The specific algorithm flow is as follows:

b80f81166c739510240cd1d81d048ed4.png

GIoU loss

Calculation formula:

188a374be5f5fca0b555311d1687767f.png

Where C is the smallest box covering the sum 2ae5e540f68bcb9f4c234cbe89632bc2.png. Due to the introduction of C, the prediction box will also move towards the target box without overlapping.43aa9cc5cd1c74f0655c3b4757869fa5.png

GIoU Advantages and Disadvantages Analysis

GIoU Loss solves the problem of IoU Loss in disjoint situations. It can be used as an appropriate substitute for IoU in all performance indicators, and can get higher accuracy in target detection tasks.

Disadvantages: Although GIoU can alleviate the vanishing gradient problem in overlapping cases, it still has some limitations. That is, it is impossible to measure the frame regression loss when there is an inclusion relationship. As shown in the figure below, the three regression frames have the same GIoU Loss, but obviously the regression effect of the third frame is better.

327213d59223275a608e6fc694fc1196.png

IoU & GIoU analysis

First, in the previous part of this paper we analyzed the limitations of the original IoU loss and GIoU loss. The process of bounding box regression will be further analyzed through the simulation experiment results.


(Supplementary explanation: Why do model experiments? Because it is difficult to analyze the process of bounding box regression only from the detection results, because the regression in uncontrolled benchmarks is often not comprehensive, such as: different distances, different scales and different aspect ratios. Instead, simulate experiments where regression is considered comprehensively and then the problem for a given loss function can be easily analyzed.)

simulation experiment

a94f5c09f3c3fe9271197961ce20cd6d.png

In simulation experiments, we try to cover most of the relationships between bounding boxes by distances, scales and aspect ratios, as shown in Figure 3(a). In particular, we choose 7 unit boxes (i.e. each box has an area of ​​1), with different aspect ratios (i.e. 1:4, 1:3, 1:2, 1:1, 2:1, 3: 1  and  4:1 ) as the target box. Without loss of generality, the center points of the 7 object boxes are fixed at (10,10) . The anchor boxes are evenly distributed over 5000 points.

1. Distance: In a circular area centered on (10, 10) with a radius of 3, select 5000 points evenly, and place anchor boxes with 7 scales and 7 aspect ratios. In these cases, both overlapping and non-overlapping boxes are included.

2. Scale: For each point, the area of ​​the anchor box is set to 0.5, 0.67, 0.75, 1, 1.33, 1.5 and 2, respectively.

3. Aspect Ratio: For a given point and scale, 7 aspect ratios are used, i.e. follow the same settings as the target box (i.e.  1:4, 1:3, 1:2, 1:1, 2:1 , 3:1 and 4:1 ). All anchor boxes correspond to each target box. In summary, there are a total of regression cases.

5aac1dd9f8be89a29400624506c52b21.png

Figure 3: Simulation experiment: (a) 1.715 million regression cases are adopted by considering different distances, scales and aspect ratios. (b) Regression error sum (i.e.: 00c5278f39a01cffa6222ca680730cc3.png ) Curves of different loss .

Then by giving the loss function 21da721a1be9eacefd38f9d99eed865b.pngwe can use the gradient descent algorithm to simulate the bounding box regression process in each case cf11e1107b59d366e8ab594d31a967e7.png. For the predicted box, the current prediction can be passed:

d827067d0e7be1ec3c297e4b24d82ae6.png

where cc6d8fa1f6759275fbed1675570e48e2.pngis 0118393b3e45dfddf4afaee06a7c3d6f.pngthe prediction box at the time of iteration, 5c2cb5381d07a232d54decc7aa50cca7.pngrepresenting the gradient of the loss. 8005afb15a3bf7ef08ea70ed837bd464.pngThe feeling can be understood as the learning rate. It is worth noting that in our implementation, the gradients ca2165df586f223290ce38a265593780.pngare multiplied by to speed up the convergence. The performance evaluation of the bounding box regression is performed by using a902c9ea9e58cf803ac5170fadf29bff.pngthe simulation experiment for each loss function. When the iteration is reached 98bea692a28d06cdf08fa0d05ae7ade8.png, the error curve 46c8fe65579ba76095c49acb380fd0ff.pngis shown in the figure.

Limitations on IoU and GIoU loss

43a782cdedebc53b97f45eb51d0fd915.png

In Figure 4, we visualize the final regression error for 5000 scattered points at iteration T.

  • It is easy to see from Fig. 4(a) that the IoU loss is only applicable to the case of overlapping with the object box. Since ∇B is always 0, anchor boxes that do not overlap will not move. See equation (3) by adding a penalty term,

  • GIoU loss can better alleviate the problem of non-overlapping cases, as shown in Figure 4(b), but GIoU loss significantly expands the basin, i.e., the working area of ​​GIoU. However, in the case of horizontal and vertical orientations, there is still a high possibility of large errors. This is because the penalty term in the GIoU loss is used to minimize |C−A∪B|, but the area of ​​C−A∪B is usually small or 0 (when two boxes have a containment relationship), and then GIoU is almost degenerates into IoU loss. Running enough iterations of the GIoU loss at an appropriate learning rate converges to a good solution, but very slowly. Geometrically speaking, from the regression step shown in Figure 1, GIoU actually increases the size of the predicted frame to overlap with the target frame, and then the IoU item is used to match the predicted frame with the target frame, resulting in very slow of convergence.

To sum up, in the non-overlapping case, IoU loss converges poorly, while GIoU loss converges slowly, especially for horizontal and vertical boxes. In the object detection process, neither IoU nor GIoU loss can guarantee the accuracy of regression.

DIoU & CIoU

Through the previous analysis of IoU and GIoU, we will naturally ask the following questions:

  1. First, is it possible to directly minimize the normalized distance between predicted and target boxes to achieve faster convergence?

  2. Second, how to make the regression more accurate and faster when there is overlap or even inclusion with the target box?

DIoU loss

Distance-IoU loss: faster and better bounding box regression loss, in general, IoU-based loss can be defined as:

0649e3299ba0ad19a6a92cdc2a1e9014.png

where is the penalty term for 18d4a24f23da20784b5b27fb3b786b45.pngthe prediction box B and the target box . ef8e02bd5afdae7b005faaebd30d163b.pngBy designing an appropriate penalty term, in this section, we propose DIoU loss and CIoU loss to answer the above two questions.

To answer the first question, we propose to minimize the normalized distance between the center points of two bounding boxes, the penalty term can be defined as:

1dff9d07c3399aae9a760d6316aa875c.png

where b2a25da1630b36a771b7dccff8182acf.pngand 1babd9d83aa1f9345eac26f131af97e4.pngrepresent fc40020884d43e52218c879a6ee9ff1a.pngthe center points of B and , respectively. 7abe9b2981f1744364d36b5f85290564.pngis the Euclidean distance, and C is the diagonal length of the smallest closed box covering two boxes. The DIoU loss function can be defined as:

3503ba85687b43804b564af89066ffc2.png

As shown in Figure 5, the penalty term of DIoU loss directly minimizes the distance between two center points, while the purpose of GIoU loss is to reduce the 88d91588a59d6a857444439cd8f1ac84.pngarea.

2ff2513da584d0127a75bcc02a835de9.png

DIoU and IoU/GIoU loss comparison

The newly proposed DIoU loss inherits some properties of IoU and GIoU losses:

  1. DIoU loss is still scale-invariant to regression problems

  2. Similar to GIoU loss, DIoU loss can provide bounding boxes with moving directions when they do not overlap with object boxes.

  3. When the two bounding boxes match perfectly, 2102eefa043ca3e6bdb5f4f5f24ad5f6.pngwhen both boxes are far away,  7dbd18b7cbf78d8ab89e2f29c6e8b719.png

DIoU loss has several advantages over IoU loss and GIoU loss, which can be evaluated by simulation experiments.

  1. As shown in Figure 1 and Figure 3, DIoU loss can directly minimize the distance between two boxes, so the convergence speed is much faster than GIoU loss.

  2. For the case where two boxes are contained (Fig. 2), or in the case of horizontal and vertical directions (Fig. 6), DIoU loss can regress very quickly, while GIoU loss almost degenerates into IoU loss, namely d30e2d02dfd9c5779f6b691df7fc1110.png

7a03418c3529ac8b138ab13242c2f62d.png

Complete IoU Loss

We then answer the second question, proposing that a good loss for bounding box regression should take into account three important geometric factors, namely overlapping area, center point distance, and aspect ratio . Overlapping regions are considered by uniform coordinate IoU loss, while GIoU loss relies heavily on IoU loss. Our proposed DIoU loss aims to simultaneously consider the overlapping area and center point distance of bounding boxes. However, the consistency of the aspect ratio of the bounding box is also an important geometric factor. Therefore, based on the DIoU loss, the CIoU loss is proposed by adding the consistency of the aspect ratio:

917c85a46d4ea2f2866b493aa5bff1b3.png

where cc50b0396f9d28f901a2cea23879286b.pngis a positive trade-off parameter that 3bc2bd84d423761664a85760070e60e1.png measures the consistency of the aspect ratio.

85e8b670e5ffd1678a899316b6ef8dcf.png

Then the loss function can be defined as:

defaead9c4d977b13c3372f19dd4d723.png

Higher priority regression is given by the overlap area factor, especially for non-overlapping cases. The optimization of the final DIoU loss is the same as that of the CIoU loss, except that 777e50ab4f6c6e24522d2ed480f8d9df.pngthe gradient of W and h should be specified.

731c578f293534b61e89154a89165c88.png

The dominant factor eea8afd6fa09b3ec09627d4f74ba525f.png is usually a small value for 09afcc75bd326a357dd4618d3b17bcc7.pngthe sum ddc98f42e755edcdef8947981b1be061.pngin the range [0,1], which is likely to produce exploding gradients. Therefore in our implementation, the dominant controller c80c4c192421bdc84194f279e2cd81bd.pngis removed, the step size is 35ee2e94c54d8a6dd245d85ec1d88032.pngreplaced by 1, and the gradient direction is still consistent with Equation (12).

NMS(Non-Maximum Suppression)

2

introduce

NMS is the last step of most object detection algorithms, in which redundant detection boxes are removed when their overlap with the highest sub-frame exceeds a threshold. Soft-NMS (Bodla et al. 2017) penalizes the detection score IoU of neighboring boxes with a continuous function w.r.t., producing softer large and stronger suppression than the original NMS.


IoU-Net (Jiang et al. 2018) proposes a new network branch to predict location reliability to guide NMS. Recently, adaptive NMS (Liu, Huang, and Wang 2019) and Softer-NMS (He et al. 2019) have been proposed to study appropriate thresholding strategies and weighted averaging strategies, respectively. In this work, DIoU is simply used as the standard of the original NMS, and when suppressing redundant boxes, the overlapping area of ​​the bounding boxes and the distance between two center points are considered at the same time.

DioU-NMS

Non-Maximum Suppression using DIoU

In the original NMS, the IoU index is used to suppress Luyu's detection frame, where the overlapping area is the only factor, which often produces false suppression for occluded cases. We propose in this work that DIoU is a better criterion for NMS because not only overlapping should be considered in the suppression criterion—DIoU-NMS is formally defined as:

dd32a8c7b4d087019486916016b6d897.png

where the box Bi is removed by simultaneously considering the IoU and the distance between the center points of the two boxes. Si is the classification score and 08b4314fcf99e1647a5646cc7934f93b.pngis the NMS threshold. We think that two boxes whose center points are far away may locate different objects and should not be removed. In addition, DIoU-NMS is very flexible and can be integrated into any object detection pipeline with just a few lines of code.

3

summary

This article mainly introduces DIoU loss and CIoU loss for bounding box regression, and DIoU-NMS for suppressing redundant detection boxes. By directly minimizing the normalized distance of two center points, DIoU loss can achieve faster convergence than GIoU loss. In addition, the CIoU loss considers three geometric properties (i.e. overlapping area, center point distance and aspect ratio), which promotes faster convergence and better performance.

reference article

  • https://github.com/Zzh-tju/DIoU/blob/master/README.md#introduction

  • https://github.com/Zzh-tju/DIoU/blob/master/README.md#introduction

  • IoU: https://arxiv.org/pdf/1608.01471.pdf

  • GIoU: https://giou.stanford.edu/GIoU.pdf

  • DIoU: https://arxiv.org/pdf/1911.08287.pdf

everyone else is watching

欢迎Star、试用OneFlow最新版本:GitHub - Oneflow-Inc/oneflow: OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient. - GitHub - Oneflow-Inc/oneflow: OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.https://github.com/Oneflow-Inc/oneflow/

Guess you like

Origin blog.csdn.net/OneFlow_Official/article/details/128859966