[Target detection algorithm] IOU, GIOU, DIOU, CIOU and YOLOv5 loss function

1 Common IOU summary

classification loss classification loss
localization loss, Localization loss (error between predicted bounding box and GT)
confidence loss Confidence loss (objectness of the box)

The total loss function: classification loss + localization loss + confidence loss

YOLOv5 uses a binary cross-entropy loss function to compute the loss for class probabilities and target confidence scores.
YOLOv5 uses CIOU Loss as the loss of bounding box regression.

Multi-label classification:

  • Most classifiers assume that the output labels are mutually exclusive. This is true if the outputs are mutually exclusive target classes. Therefore, YOLO applies a softmax function to convert the scores into probabilities that sum to 1. And YOLOv3/v4/v5 uses multi-label classification. For example, the output labels could be "pedestrian" and "child", which are not non-exclusive. (the sum of the outputs can be greater than 1)
  • YOLOv3/v4/v5 replace the softmax function with multiple independent logistic classifiers to calculate the likelihood of an input belonging to a particular label.
  • When computing the classification loss for training, YOLOv3/v4/v5 use a binary cross-entropy loss for each label. This also reduces computational complexity by avoiding the use of the softmax function.

1.1 IOU(Intersection over Union)

insert image description here
insert image description here

There are two problems here:

  • If two objects do not overlap , the IoU value will be zero and will not reflect the distance of the two shapes from each other; and if IoU is used as a loss, its gradient will be zero and cannot be optimized .
  • IoU cannot distinguish different alignments between two objects. (As shown below)

insert image description here

1.2 GIOU(Generalized-IoU)

insert image description here
C is the smallest area that contains A and B.
Cinsert image description here

When GIOU is used as a loss function, Loss=1-GIOU. When the two boxes A and B do not intersect, the value of A∪B remains unchanged. Maximizing GIOU is to minimize C, which will make the two boxes keep getting closer.

insert image description here

1.3 DIOU (Distance-IoU)

insert image description hereBounding box regression step according to GIoU loss (first row) and DIoU loss (second row). Green and black represent target and anchor boxes, respectively. Blue and red represent predicted boxes for GIoU loss and DIoU loss, respectively. GIoU loss generally increases the size of the Bounding box to overlap with the target box, while DIoU loss directly minimizes the normalized distance of the center point .
insert image description here
In these cases (C=A∪B) , GIoU loss degrades to IoU loss, while our DIoU loss is still differentiable. Green and red represent target and predicted boxes, respectively.

insert image description here

Where B_gt = (xgt, ygt, wgt, hgt) is the ground-truth, and B = (x, y, w, h) is the prediction frame.
The DIoU loss of bounding box regression, the normalized distance between its center points can be directly minimized.
C is the diagonal length of the smallest bounding box covering the two boxes, and d = ρ(b, b_GT) is the distance between the center points of the two boxes.

insert image description here

insert image description here
The IoU loss has large errors in the non-overlapping case, the GIoU loss has large errors in the horizontal and vertical cases, and our DIoU loss has very small regression errors everywhere.

1.4 CIOU(Complete-IoU)

insert image description here
The bounding box update after different iterations optimized according to GIoU loss (first row) and CIoU loss (second row) . Green and black represent target and anchor boxes, respectively. Blue and red represent predicted boxes for GIoU loss and CIoU loss, respectively. The GIoU loss only considers the overlapping area and tends to increase the GIoU by increasing the size of the predicted box.

Thanks to these three geometric factors (center point distance, overlapping area and aspect ratio), the minimum normalized center point distance in CIoU loss can make the algorithm converge quickly , and the consistency of overlapping area and aspect ratio helps Better match both boxes. insert image description here
It can be seen that for the non-overlapping case, the IoU loss has a large error. For both horizontal and vertical cases, the GIoU loss has large errors. While our CIoU loss has a small regression error in any scene .
insert image description hereinsert image description hereinsert image description here
CIoU Loss is defined as:
insert image description here

1.5 Summary of loss function

IoU series(IoU, GIoU, DIoU, CIoU)

The picture below is a quote from the blog above! The blogger summed it up very well!

insert image description here

2 YOLOv5 loss function

def bbox_iou(box1, box2, xywh=True, GIoU=False, DIoU=False, CIoU=False, eps=1e-7):
    # Returns Intersection over Union (IoU) of box1(1,4) to box2(n,4)

    # Get the coordinates of bounding boxes
    if xywh:  # transform from xywh to xyxy
        (x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, 1), box2.chunk(4, 1)
        w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
        b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
        b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
    else:  # x1, y1, x2, y2 = box1
        b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, 1)
        b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, 1)
        w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1
        w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1

    # Intersection area
    inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \
            (torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0)

    # Union Area
    union = w1 * h1 + w2 * h2 - inter + eps

    # IoU
    iou = inter / union
    if CIoU or DIoU or GIoU:
        cw = torch.max(b1_x2, b2_x2) - torch.min(b1_x1, b2_x1)  # convex (smallest enclosing box) width
        ch = torch.max(b1_y2, b2_y2) - torch.min(b1_y1, b2_y1)  # convex height
        if CIoU or DIoU:  # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
            c2 = cw ** 2 + ch ** 2 + eps  # convex diagonal squared
            rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4  # center dist ** 2
            if CIoU:  # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
                v = (4 / math.pi ** 2) * torch.pow(torch.atan(w2 / (h2 + eps)) - torch.atan(w1 / (h1 + eps)), 2)
                with torch.no_grad():
                    alpha = v / (v - iou + (1 + eps))
                return iou - (rho2 / c2 + v * alpha)  # CIoU
            return iou - rho2 / c2  # DIoU
        c_area = cw * ch + eps  # convex area
        return iou - (c_area - union) / c_area  # GIoU https://arxiv.org/pdf/1902.09630.pdf
    return iou  # IoU
class ComputeLoss:
    sort_obj_iou = False

    # Compute losses
    def __init__(self, model, autobalance=False):
        device = next(model.parameters()).device  # get model device
        h = model.hyp  # hyperparameters

        # Define criteria
        BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device))
        BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device))

        # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
        self.cp, self.cn = smooth_BCE(eps=h.get('label_smoothing', 0.0))  # positive, negative BCE targets

        # Focal loss
        g = h['fl_gamma']  # focal loss gamma
        if g > 0:
            BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)

        m = de_parallel(model).model[-1]  # Detect() module
        self.balance = {
    
    3: [4.0, 1.0, 0.4]}.get(m.nl, [4.0, 1.0, 0.25, 0.06, 0.02])  # P3-P7
        self.ssi = list(m.stride).index(16) if autobalance else 0  # stride 16 index
        self.BCEcls, self.BCEobj, self.gr, self.hyp, self.autobalance = BCEcls, BCEobj, 1.0, h, autobalance
        self.na = m.na  # number of anchors
        self.nc = m.nc  # number of classes
        self.nl = m.nl  # number of layers
        self.anchors = m.anchors
        self.device = device

    def __call__(self, p, targets):  # predictions, targets
        lcls = torch.zeros(1, device=self.device)  # class loss
        lbox = torch.zeros(1, device=self.device)  # box loss
        lobj = torch.zeros(1, device=self.device)  # object loss
        tcls, tbox, indices, anchors = self.build_targets(p, targets)  # targets

        # Losses
        for i, pi in enumerate(p):  # layer index, layer predictions
            b, a, gj, gi = indices[i]  # image, anchor, gridy, gridx
            tobj = torch.zeros(pi.shape[:4], dtype=pi.dtype, device=self.device)  # target obj

            n = b.shape[0]  # number of targets
            if n:
                # pxy, pwh, _, pcls = pi[b, a, gj, gi].tensor_split((2, 4, 5), dim=1)  # faster, requires torch 1.8.0
                pxy, pwh, _, pcls = pi[b, a, gj, gi].split((2, 2, 1, self.nc), 1)  # target-subset of predictions

                # Regression
                pxy = pxy.sigmoid() * 2 - 0.5
                pwh = (pwh.sigmoid() * 2) ** 2 * anchors[i]
                pbox = torch.cat((pxy, pwh), 1)  # predicted box
                iou = bbox_iou(pbox, tbox[i], CIoU=True).squeeze()  # iou(prediction, target)
                lbox += (1.0 - iou).mean()  # iou loss

                # Objectness
                iou = iou.detach().clamp(0).type(tobj.dtype)
                if self.sort_obj_iou:
                    j = iou.argsort()
                    b, a, gj, gi, iou = b[j], a[j], gj[j], gi[j], iou[j]
                if self.gr < 1:
                    iou = (1.0 - self.gr) + self.gr * iou
                tobj[b, a, gj, gi] = iou  # iou ratio

                # Classification
                if self.nc > 1:  # cls loss (only if multiple classes)
                    t = torch.full_like(pcls, self.cn, device=self.device)  # targets
                    t[range(n), tcls[i]] = self.cp
                    lcls += self.BCEcls(pcls, t)  # BCE

                # Append targets to text file
                # with open('targets.txt', 'a') as file:
                #     [file.write('%11.5g ' * 4 % tuple(x) + '\n') for x in torch.cat((txy[i], twh[i]), 1)]

            obji = self.BCEobj(pi[..., 4], tobj)
            lobj += obji * self.balance[i]  # obj loss
            if self.autobalance:
                self.balance[i] = self.balance[i] * 0.9999 + 0.0001 / obji.detach().item()

        if self.autobalance:
            self.balance = [x / self.balance[self.ssi] for x in self.balance]
        lbox *= self.hyp['box']
        lobj *= self.hyp['obj']
        lcls *= self.hyp['cls']
        bs = tobj.shape[0]  # batch size

        return (lbox + lobj + lcls) * bs, torch.cat((lbox, lobj, lcls)).detach()

Guess you like

Origin blog.csdn.net/weixin_45751396/article/details/127150065