1 Common IOU summary
classification loss | classification loss |
---|---|
localization loss, | Localization loss (error between predicted bounding box and GT) |
confidence loss | Confidence loss (objectness of the box) |
The total loss function: classification loss + localization loss + confidence loss
YOLOv5 uses a binary cross-entropy loss function to compute the loss for class probabilities and target confidence scores.
YOLOv5 uses CIOU Loss as the loss of bounding box regression.
Multi-label classification:
- Most classifiers assume that the output labels are mutually exclusive. This is true if the outputs are mutually exclusive target classes. Therefore, YOLO applies a softmax function to convert the scores into probabilities that sum to 1. And YOLOv3/v4/v5 uses multi-label classification. For example, the output labels could be "pedestrian" and "child", which are not non-exclusive. (the sum of the outputs can be greater than 1)
- YOLOv3/v4/v5 replace the softmax function with multiple independent logistic classifiers to calculate the likelihood of an input belonging to a particular label.
- When computing the classification loss for training, YOLOv3/v4/v5 use a binary cross-entropy loss for each label. This also reduces computational complexity by avoiding the use of the softmax function.
1.1 IOU(Intersection over Union)
There are two problems here:
- If two objects do not overlap , the IoU value will be zero and will not reflect the distance of the two shapes from each other; and if IoU is used as a loss, its gradient will be zero and cannot be optimized .
- IoU cannot distinguish different alignments between two objects. (As shown below)
1.2 GIOU(Generalized-IoU)
C is the smallest area that contains A and B.
C
When GIOU is used as a loss function, Loss=1-GIOU. When the two boxes A and B do not intersect, the value of A∪B remains unchanged. Maximizing GIOU is to minimize C, which will make the two boxes keep getting closer.
1.3 DIOU (Distance-IoU)
Bounding box regression step according to GIoU loss (first row) and DIoU loss (second row). Green and black represent target and anchor boxes, respectively. Blue and red represent predicted boxes for GIoU loss and DIoU loss, respectively. GIoU loss generally increases the size of the Bounding box to overlap with the target box, while DIoU loss directly minimizes the normalized distance of the center point .
In these cases (C=A∪B) , GIoU loss degrades to IoU loss, while our DIoU loss is still differentiable. Green and red represent target and predicted boxes, respectively.
Where B_gt = (xgt, ygt, wgt, hgt) is the ground-truth, and B = (x, y, w, h) is the prediction frame.
The DIoU loss of bounding box regression, the normalized distance between its center points can be directly minimized.
C is the diagonal length of the smallest bounding box covering the two boxes, and d = ρ(b, b_GT) is the distance between the center points of the two boxes.
The IoU loss has large errors in the non-overlapping case, the GIoU loss has large errors in the horizontal and vertical cases, and our DIoU loss has very small regression errors everywhere.
1.4 CIOU(Complete-IoU)
The bounding box update after different iterations optimized according to GIoU loss (first row) and CIoU loss (second row) . Green and black represent target and anchor boxes, respectively. Blue and red represent predicted boxes for GIoU loss and CIoU loss, respectively. The GIoU loss only considers the overlapping area and tends to increase the GIoU by increasing the size of the predicted box.
Thanks to these three geometric factors (center point distance, overlapping area and aspect ratio), the minimum normalized center point distance in CIoU loss can make the algorithm converge quickly , and the consistency of overlapping area and aspect ratio helps Better match both boxes.
It can be seen that for the non-overlapping case, the IoU loss has a large error. For both horizontal and vertical cases, the GIoU loss has large errors. While our CIoU loss has a small regression error in any scene .
CIoU Loss is defined as:
1.5 Summary of loss function
IoU series(IoU, GIoU, DIoU, CIoU)
The picture below is a quote from the blog above! The blogger summed it up very well!
2 YOLOv5 loss function
def bbox_iou(box1, box2, xywh=True, GIoU=False, DIoU=False, CIoU=False, eps=1e-7):
# Returns Intersection over Union (IoU) of box1(1,4) to box2(n,4)
# Get the coordinates of bounding boxes
if xywh: # transform from xywh to xyxy
(x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, 1), box2.chunk(4, 1)
w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
else: # x1, y1, x2, y2 = box1
b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, 1)
b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, 1)
w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1
w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1
# Intersection area
inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \
(torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0)
# Union Area
union = w1 * h1 + w2 * h2 - inter + eps
# IoU
iou = inter / union
if CIoU or DIoU or GIoU:
cw = torch.max(b1_x2, b2_x2) - torch.min(b1_x1, b2_x1) # convex (smallest enclosing box) width
ch = torch.max(b1_y2, b2_y2) - torch.min(b1_y1, b2_y1) # convex height
if CIoU or DIoU: # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
c2 = cw ** 2 + ch ** 2 + eps # convex diagonal squared
rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4 # center dist ** 2
if CIoU: # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
v = (4 / math.pi ** 2) * torch.pow(torch.atan(w2 / (h2 + eps)) - torch.atan(w1 / (h1 + eps)), 2)
with torch.no_grad():
alpha = v / (v - iou + (1 + eps))
return iou - (rho2 / c2 + v * alpha) # CIoU
return iou - rho2 / c2 # DIoU
c_area = cw * ch + eps # convex area
return iou - (c_area - union) / c_area # GIoU https://arxiv.org/pdf/1902.09630.pdf
return iou # IoU
class ComputeLoss:
sort_obj_iou = False
# Compute losses
def __init__(self, model, autobalance=False):
device = next(model.parameters()).device # get model device
h = model.hyp # hyperparameters
# Define criteria
BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device))
BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device))
# Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
self.cp, self.cn = smooth_BCE(eps=h.get('label_smoothing', 0.0)) # positive, negative BCE targets
# Focal loss
g = h['fl_gamma'] # focal loss gamma
if g > 0:
BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)
m = de_parallel(model).model[-1] # Detect() module
self.balance = {
3: [4.0, 1.0, 0.4]}.get(m.nl, [4.0, 1.0, 0.25, 0.06, 0.02]) # P3-P7
self.ssi = list(m.stride).index(16) if autobalance else 0 # stride 16 index
self.BCEcls, self.BCEobj, self.gr, self.hyp, self.autobalance = BCEcls, BCEobj, 1.0, h, autobalance
self.na = m.na # number of anchors
self.nc = m.nc # number of classes
self.nl = m.nl # number of layers
self.anchors = m.anchors
self.device = device
def __call__(self, p, targets): # predictions, targets
lcls = torch.zeros(1, device=self.device) # class loss
lbox = torch.zeros(1, device=self.device) # box loss
lobj = torch.zeros(1, device=self.device) # object loss
tcls, tbox, indices, anchors = self.build_targets(p, targets) # targets
# Losses
for i, pi in enumerate(p): # layer index, layer predictions
b, a, gj, gi = indices[i] # image, anchor, gridy, gridx
tobj = torch.zeros(pi.shape[:4], dtype=pi.dtype, device=self.device) # target obj
n = b.shape[0] # number of targets
if n:
# pxy, pwh, _, pcls = pi[b, a, gj, gi].tensor_split((2, 4, 5), dim=1) # faster, requires torch 1.8.0
pxy, pwh, _, pcls = pi[b, a, gj, gi].split((2, 2, 1, self.nc), 1) # target-subset of predictions
# Regression
pxy = pxy.sigmoid() * 2 - 0.5
pwh = (pwh.sigmoid() * 2) ** 2 * anchors[i]
pbox = torch.cat((pxy, pwh), 1) # predicted box
iou = bbox_iou(pbox, tbox[i], CIoU=True).squeeze() # iou(prediction, target)
lbox += (1.0 - iou).mean() # iou loss
# Objectness
iou = iou.detach().clamp(0).type(tobj.dtype)
if self.sort_obj_iou:
j = iou.argsort()
b, a, gj, gi, iou = b[j], a[j], gj[j], gi[j], iou[j]
if self.gr < 1:
iou = (1.0 - self.gr) + self.gr * iou
tobj[b, a, gj, gi] = iou # iou ratio
# Classification
if self.nc > 1: # cls loss (only if multiple classes)
t = torch.full_like(pcls, self.cn, device=self.device) # targets
t[range(n), tcls[i]] = self.cp
lcls += self.BCEcls(pcls, t) # BCE
# Append targets to text file
# with open('targets.txt', 'a') as file:
# [file.write('%11.5g ' * 4 % tuple(x) + '\n') for x in torch.cat((txy[i], twh[i]), 1)]
obji = self.BCEobj(pi[..., 4], tobj)
lobj += obji * self.balance[i] # obj loss
if self.autobalance:
self.balance[i] = self.balance[i] * 0.9999 + 0.0001 / obji.detach().item()
if self.autobalance:
self.balance = [x / self.balance[self.ssi] for x in self.balance]
lbox *= self.hyp['box']
lobj *= self.hyp['obj']
lcls *= self.hyp['cls']
bs = tobj.shape[0] # batch size
return (lbox + lobj + lcls) * bs, torch.cat((lbox, lobj, lcls)).detach()