YOLOv5-6.1 source code detailed explanation of loss function loss.py

Table of contents

1 Measurement of accuracy of target detection results

2 YOLOv5-6.1 loss function

2.1 classification category loss

2.2 confidence loss

2.3 localization positioning loss

3 YOLOv5-6.1 loss function loss.py code analysis

3.1 class ComputeLoss

3.1.1 __heat__

3.1.2 build_targets

3.1.3 _call__

3.2 smooth_BCE

3.3 BCEBlurWithLogitsLoss

3.4 WordLoss 

References 


1 Measurement of accuracy of target detection results

The object detection task has three main purposes:

  • Detect the location of the target in the image. There may be multiple detection targets in the same image;
  • Detect the size of the target, usually a rectangular box that exactly surrounds the target;
  • Identify and classify detected targets.

Therefore, judging whether the test results are accurate or not is mainly based on the above three purposes:

  • First let us define the ideal situation:All locations in the image where the target actually exists are detected. The closer the detection result is to this ideal state, that is, the fewer targets there aremissed detections/false detections, the more accurate the results are considered;
  • Similarly define the ideal situation:The detected rectangular frame can exactly surround the detection target. The closer the test result is to this ideal state, the more accurate the result is;
  • The detected targets are identified and classified. The more consistent the classification results are with the actual classification of the targets, the more accurate the results are.

As shown in the figure below, people and buses are the detection targets. It is necessary to detect the positions of all people and the bus, and also to detect the smallest rectangular frame surrounding the people and the bus. At the same time, it is also necessary to identify which rectangular frame contains the person and which rectangle. Inside the frame is a bus.

2 YOLOv5-6.1 loss function

By reading the loss.py file in the YOLOv5-6.1 version, we can see that the loss function of YOLOv5 includes:

  • classification loss classification loss
  • localization loss, the error between the predicted box and the real box
  • confidence loss confidence loss, targetness of the box

The losses of the three parts are calculated based on the matched pairs of positive samples. Each output feature map is independent of each other and is directly added to obtain the final loss value of each part. First give the overall calculation formula:

2.1 classification category loss

Category loss is similar to confidence loss. Category loss is calculated through the category score of the predicted box and the one-hot performance of the target box category. The formula is as follows:

Here the target confidence loss and category loss use the binary cross-entropy function BCEWithLogitsLoss with sigmoid. If you want to use Focal Loss, just modify it based on it.​ 

BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device))
lcls += self.BCEcls(pi[..., 5:], t_cls)

2.2 confidence loss

The target confidence loss is calculated from the sample pairs obtained by matching the positive samples. The first is the target confidence score in the prediction boxp_{o}; the second is the prediction box and the corresponding target box. iou value, which serves as ground-truth. The two calculate binary cross entropy to get the final target confidence loss. The formula is as follows:

BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device))
obji = self.BCEobj(pi[..., 4], tobj)

2.3 localization positioning loss

 YOLOv5 uses CIoU Loss. For specific CIoU Loss analysis, please refer to Optimizing and improving the YOLOv5 algorithm by adding GIoU, DIoU, CIoU, EIoU, and Wise-IoU modules (super detailed)_eiou Code-CSDN BlogThe positive sample matching strategy and bbox regression in YOLOv5 are shown in the figure below.

iou_term = bbox_iou(pbox.T, tbox[i], x1y1x2y2=False, CIoU=True)
lbox += (1.0 - iou_term).mean()

3 YOLOv5-6.1 loss function loss.py code analysis

3.1 class ComputeLoss

3.1.1 __heat__

The code and comments are as follows:

  def __init__(self, model, autobalance=False):
        super(ComputeLoss, self).__init__()
        device = next(model.parameters()).device  # get model device
        h = model.hyp  # hyperparameters
		
		'''
		定义分类损失和置信度损失为带sigmoid的二值交叉熵损失,
		即会先将输入进行sigmoid再计算BinaryCrossEntropyLoss(BCELoss)。
		pos_weight参数是正样本损失的权重参数。
		'''
        # Define criteria
        BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device))
        BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device))

		'''
		对标签做平滑,eps=0就代表不做标签平滑,那么默认cp=1,cn=0
        后续对正类别赋值cp,负类别赋值cn
		'''
        # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
        self.cp, self.cn = smooth_BCE(eps=h.get('label_smoothing', 0.0))  # positive, negative BCE targets
		
		'''
		超参设置g>0则计算FocalLoss
		'''
        # Focal loss
        g = h['fl_gamma']  # focal loss gamma
        if g > 0:
            BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)
		
		'''
		获取detect层
		'''
        det = model.module.model[-1] if is_parallel(model) else model.model[-1]  # Detect() module
        '''
        每一层预测值所占的权重比,分别代表浅层到深层,小特征到大特征,4.0对应着P3,1.0对应P4,0.4对应P5。
        如果是自己设置的输出不是3层,则返回[4.0, 1.0, 0.25, 0.06, .02],可对应1-5个输出层P3-P7的情况。
        '''
        self.balance = {3: [4.0, 1.0, 0.4]}.get(det.nl, [4.0, 1.0, 0.25, 0.06, .02])  # P3-P7
        '''
        autobalance 默认为 False,yolov5中目前也没有使用 ssi = 0即可
        '''
        self.ssi = list(det.stride).index(16) if autobalance else 0  # stride 16 index
        '''
        赋值各种参数,gr是用来设置IoU的值在objectness loss中做标签的系数, 
        使用代码如下:
		tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * iou.detach().clamp(0).type(tobj.dtype)  # iou ratio
        train.py源码中model.gr=1,也就是说完全使用标签框与预测框的CIoU值来作为该预测框的objectness标签。
        '''
        self.BCEcls, self.BCEobj, self.gr, self.hyp, self.autobalance = BCEcls, BCEobj, model.gr, h, autobalance
        for k in 'na', 'nc', 'nl', 'anchors':
            setattr(self, k, getattr(det, k))

3.1.2 build_targets

The code and comments are as follows:

def build_targets(self, p, targets):
        # Build targets for compute_loss(), input targets(image,class,x,y,w,h)
        '''
        na = 3,表示每个预测层anchors的个数
        targets 为一个batch中所有的标签,包括标签所属的image,以及class,x,y,w,h
        targets = [[image1,class1,x1,y1,w1,h1],
        		   [image2,class2,x2,y2,w2,h2],
        		   ...
        		   [imageN,classN,xN,yN,wN,hN]]
        nt为一个batch中所有标签的数量
        '''
        na, nt = self.na, targets.shape[0]  # number of anchors, targets
        tcls, tbox, indices, anch = [], [], [], []
        '''
        gain是为了最终将坐标所属grid坐标限制在坐标系内,不要超出范围,
        其中7是为了对应: image class x y w h ai,
        但后续代码只对x y w h赋值,x,y,w,h = nx,ny,nx,ny,
        nx和ny为当前输出层的grid大小。
        '''
        gain = torch.ones(7, device=targets.device)  # normalized to gridspace gain
        '''
        ai.shape = [na,nt]
        ai = [[0,0,0,.....],
        	  [1,1,1,...],
        	  [2,2,2,...]]
        这么做的目的是为了给targets增加一个属性,即当前标签所属的anchor索引
        '''
        ai = torch.arange(na, device=targets.device).float().view(na, 1).repeat(1, nt)  # same as .repeat_interleave(nt)
        '''
        targets.repeat(na, 1, 1).shape = [na,nt,6]
        ai[:, :, None].shape = [na,nt,1](None在list中的作用就是在插入维度1)
        ai[:, :, None] = [[[0],[0],[0],.....],
        	  			  [[1],[1],[1],...],
        	  	  		  [[2],[2],[2],...]]
        cat之后:
        targets.shape = [na,nt,7]
        targets = [[[image1,class1,x1,y1,w1,h1,0],
        			[image2,class2,x2,y2,w2,h2,0],
        			...
        			[imageN,classN,xN,yN,wN,hN,0]],
        			[[image1,class1,x1,y1,w1,h1,1],
        			 [image2,class2,x2,y2,w2,h2,1],
        			...],
        			[[image1,class1,x1,y1,w1,h1,2],
        			 [image2,class2,x2,y2,w2,h2,2],
        			...]]
        这么做是为了纪录每个label对应的anchor。
        '''
        targets = torch.cat((targets.repeat(na, 1, 1), ai[:, :, None]), 2)  # append anchor indices
		
		'''
		定义每个grid偏移量,会根据标签在grid中的相对位置来进行偏移
		'''
        g = 0.5  # bias
        '''
        [0, 0]代表中间,
		[1, 0] * g = [0.5, 0]代表往左偏移半个grid, [0, 1]*0.5 = [0, 0.5]代表往上偏移半个grid,与后面代码的j,k对应
		[-1, 0] * g = [-0.5, 0]代代表往右偏移半个grid, [0, -1]*0.5 = [0, -0.5]代表往下偏移半个grid,与后面代码的l,m对应
		具体原理在代码后讲述
        '''
        off = torch.tensor([[0, 0],
                            [1, 0], [0, 1], [-1, 0], [0, -1],  # j,k,l,m
                            # [1, 1], [1, -1], [-1, 1], [-1, -1],  # jk,jm,lk,lm
                            ], device=targets.device).float() * g  # offsets

        for i in range(self.nl):
        	'''
        	原本yaml中加载的anchors.shape = [3,6],但在yolo.py的Detect中已经通过代码
        	a = torch.tensor(anchors).float().view(self.nl, -1, 2)
        	self.register_buffer('anchors', a) 
        	将anchors进行了reshape。
        	self.anchors.shape = [3,3,2]
        	anchors.shape = [3,2]
        	'''
            anchors = self.anchors[i]
            '''
            p.shape = [nl,bs,na,nx,ny,no]
            p[i].shape = [bs,na,nx,ny,no]
            gain = [1,1,nx,ny,nx,ny,1]
            '''
            gain[2:6] = torch.tensor(p[i].shape)[[3, 2, 3, 2]]  # xyxy gain

            # Match targets to anchors
            '''
            因为targets进行了归一化,默认在w = 1, h =1 的坐标系中,
            需要将其映射到当前输出层w = nx, h = ny的坐标系中。
            '''
            t = targets * gain
            if nt:
                # Matches
                '''
                t[:, :, 4:6].shape = [na,nt,2] = [3,nt,2],存放的是标签的w和h
                anchor[:,None] = [3,1,2]
                r.shape = [3,nt,2],存放的是标签和当前层anchor的长宽比
                '''
                r = t[:, :, 4:6] / anchors[:, None]  # wh ratio
                '''
                torch.max(r, 1. / r)求出最大的宽比和最大的长比,shape = [3,nt,2]
                再max(2)求出同一标签中宽比和长比较大的一个,shape = [2,3,nt],之所以第一个维度变成2,
                因为torch.max如果不是比较两个tensor的大小,而是比较1个tensor某一维度的大小,则会返回values和indices:
                	torch.return_types.max(
						values=tensor([...]),
						indices=tensor([...]))
                所以还需要加上索引0获取values,
                torch.max(r, 1. / r).max(2)[0].shape = [3,nt],
                将其和hyp.yaml中的anchor_t超参比较,小于该值则认为标签属于当前输出层的anchor
                j = [[bool,bool,....],[bool,bool,...],[bool,bool,...]]
                j.shape = [3,nt]
                '''
                j = torch.max(r, 1. / r).max(2)[0] < self.hyp['anchor_t']  # compare
                # j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t']  # iou(3,n)=wh_iou(anchors(3,2), gwh(n,2))
                '''
                t.shape = [na,nt,7] 
                j.shape = [3,nt]
                假设j中有NTrue个True值,则
                t[j].shape = [NTrue,7]
                返回的是na*nt的标签中,所有属于当前层anchor的标签。
                '''
                t = t[j]  # filter

                # Offsets
                '''
                下面这段代码和注释可以配合代码后的图片进行理解。
                t.shape = [NTrue,7] 
                7:image,class,x,y,h,w,ai
                gxy.shape = [NTrue,2] 存放的是x,y,相当于坐标到坐标系左边框和上边框的记录
                gxi.shape = [NTrue,2] 存放的是w-x,h-y,相当于测量坐标到坐标系右边框和下边框的距离
                '''
                gxy = t[:, 2:4]  # grid xy
                gxi = gain[[2, 3]] - gxy  # inverse
                '''
                因为grid单位为1,共nx*ny个gird
                gxy % 1相当于求得标签在第gxy.long()个grid中以grid左上角为原点的相对坐标,
                gxi % 1相当于求得标签在第gxy.long()个grid中以grid右下角为原点的相对坐标,
                下面这两行代码作用在于
                筛选中心坐标 左、上方偏移量小于0.5,并且中心点大于1的标签
                筛选中心坐标 右、下方偏移量小于0.5,并且中心点大于1的标签          
                j.shape = [NTrue], j = [bool,bool,...]
                k.shape = [NTrue], k = [bool,bool,...]
                l.shape = [NTrue], l = [bool,bool,...]
                m.shape = [NTrue], m = [bool,bool,...]
                '''
                j, k = ((gxy % 1. < g) & (gxy > 1.)).T 
                l, m = ((gxi % 1. < g) & (gxi > 1.)).T  
                '''
                j.shape = [5,NTrue]
                t.repeat之后shape为[5,NTrue,7], 
                通过索引j后t.shape = [NOff,7],NOff表示NTrue + (j,k,l,m中True的总数量)
                torch.zeros_like(gxy)[None].shape = [1,NTrue,2]
                off[:, None].shape = [5,1,2]
                相加之和shape = [5,NTrue,2]
                通过索引j后offsets.shape = [NOff,2]
                这段代码的表示当标签在grid左侧半部分时,会将标签往左偏移0.5个grid,上下右同理。
                '''   
                j = torch.stack((torch.ones_like(j), j, k, l, m))
                t = t.repeat((5, 1, 1))[j]
                offsets = (torch.zeros_like(gxy)[None] + off[:, None])[j]
            else:
                t = targets[0]
                offsets = 0
			
            # Define
            
            '''
            t.shape = [NOff,7],(image,class,x,y,w,h,ai)
            '''
            b, c = t[:, :2].long().T  # image, class
            gxy = t[:, 2:4]  # grid xy
            gwh = t[:, 4:6]  # grid wh
            '''
            offsets.shape = [NOff,2]
            gxy - offsets为gxy偏移后的坐标,
            gxi通过long()得到偏移后坐标所在的grid坐标
            '''
            gij = (gxy - offsets).long()
            gi, gj = gij.T  # grid xy indices

            # Append
            '''
            a:所有anchor的索引 shape = [NOff]
            b:标签所属image的索引 shape = [NOff]
            gj.clamp_(0, gain[3] - 1)将标签所在grid的y限定在0到ny-1之间
            gi.clamp_(0, gain[2] - 1)将标签所在grid的x限定在0到nx-1之间
            indices = [image, anchor, gridy, gridx] 最终shape = [nl,4,NOff]
            tbox存放的是标签在所在grid内的相对坐标,∈[0,1] 最终shape = [nl,NOff]
            anch存放的是anchors 最终shape = [nl,NOff,2]
            tcls存放的是标签的分类 最终shape = [nl,NOff]
            '''
            a = t[:, 6].long()  # anchor indices
            indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1)))  # image, anchor, grid indices
            tbox.append(torch.cat((gxy - gij, gwh), 1))  # box
            anch.append(anchors[a])  # anchors
            tcls.append(c)  # class

        return tcls, tbox, indices, anch

The code in the above paper contains the code part for label offset:

The label in grid B is in the upper right half, so the label is offset 0.5 grid to E. The same is true for A, B, C, and D. That is, each grid will not only return the target whose center point is in the grid, but also return The center point is the target of the surrounding grid near this grid. Taking the upper left corner of E as the coordinate (Cx, Cy), so bx∈[Cx-0.5,Cx+1.5], by∈[Cy-0.5,Cy+1.5], and bw∈[0,4pw], bh∈[0 ,4ph] should be to limit the size of the anchor.

3.1.3 _call__

The code and comments are as follows:

def __call__(self, p, targets):  # predictions, targets, model
        device = targets.device
        lcls, lbox, lobj = torch.zeros(1, device=device), torch.zeros(1, device=device), torch.zeros(1, device=device)
        '''
        从build_targets函数中构建目标标签,获取标签中的tcls, tbox, indices, anchors
        tcls = [[cls1,cls2,...],[cls1,cls2,...],[cls1,cls2,...]]
        tcls.shape = [nl,N]
        tbox = [[[gx1,gy1,gw1,gh1],[gx2,gy2,gw2,gh2],...],
        
        indices = [[image indices1,anchor indices1,gridj1,gridi1],
        		   [image indices2,anchor indices2,gridj2,gridi2],
        		   ...]]
        anchors = [[aw1,ah1],[aw2,ah2],...]		  
        '''
        tcls, tbox, indices, anchors = self.build_targets(p, targets)  # targets

        # Losses
        '''
		p.shape = [nl,bs,na,nx,ny,no]
		nl 为 预测层数,一般为3
		na 为 每层预测层的anchor数,一般为3
		nx,ny 为 grid的w和h
		no 为 输出数,为5 + nc (5:x,y,w,h,obj,nc:分类数)
		'''
        for i, pi in enumerate(p):  # layer index, layer predictions
            '''
            a:所有anchor的索引
            b:标签所属image的索引
            gridy:标签所在grid的y,在0到ny-1之间
            gridy:标签所在grid的x,在0到nx-1之间
            '''
            b, a, gj, gi = indices[i]  # image, anchor, gridy, gridx
            '''
            pi.shape = [bs,na,nx,ny,no]
            tobj.shape = [bs,na,nx,ny]
            '''
            tobj = torch.zeros_like(pi[..., 0], device=device)  # target obj

            n = b.shape[0]  # number of targets
            if n:
            	'''
            	ps为batch中第b个图像第a个anchor的第gj行第gi列的output
            	ps.shape = [N,5+nc],N = a[0].shape,即符合anchor大小的所有标签数
            	'''
                ps = pi[b, a, gj, gi]  # prediction subset corresponding to targets

				'''
				xy的预测范围为-0.5~1.5
                wh的预测范围是0~4倍anchor的w和h,
                原理在代码后讲述。
				'''
                # Regression
                pxy = ps[:, :2].sigmoid() * 2. - 0.5
                pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i]
                pbox = torch.cat((pxy, pwh), 1)  # predicted box
                '''
                只有当CIOU=True时,才计算CIOU,否则默认为GIOU
                '''
                iou = bbox_iou(pbox.T, tbox[i], x1y1x2y2=False, CIoU=True)  # iou(prediction, target)
                lbox += (1.0 - iou).mean()  # iou loss

                # Objectness
                '''
                通过gr用来设置IoU的值在objectness loss中做标签的比重, 
                '''
                tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * iou.detach().clamp(0).type(tobj.dtype)  # iou ratio

                # Classification
                if self.nc > 1:  # cls loss (only if multiple classes)
                    '''
               		ps[:, 5:].shape = [N,nc],用 self.cn 来填充型为[N,nc]得Tensor。
               		self.cn通过smooth_BCE平滑标签得到的,使得负样本不再是0,而是0.5 * eps
                	'''
                    t = torch.full_like(ps[:, 5:], self.cn, device=device)  # targets
                    '''
                    self.cp 是通过smooth_BCE平滑标签得到的,使得正样本不再是1,而是1.0 - 0.5 * eps
                    '''
                    t[range(n), tcls[i]] = self.cp 
                    '''
                    计算用sigmoid+BCE分类损失
                    '''
                    lcls += self.BCEcls(ps[:, 5:], t)  # BCE

                # Append targets to text file
                # with open('targets.txt', 'a') as file:
                #     [file.write('%11.5g ' * 4 % tuple(x) + '\n') for x in torch.cat((txy[i], twh[i]), 1)]
			'''
			pi[..., 4]所存储的是预测的obj
			'''
            obji = self.BCEobj(c, tobj)
            '''
			self.balance[i]为第i层输出层所占的权重,在init函数中已介绍
			将每层的损失乘上权重计算得到obj损失
			'''
            lobj += obji * self.balance[i]  # obj loss
            if self.autobalance:
                self.balance[i] = self.balance[i] * 0.9999 + 0.0001 / obji.detach().item()

        if self.autobalance:
            self.balance = [x / self.balance[self.ssi] for x in self.balance]
        '''
        hyp.yaml中设置了每种损失所占比重,分别对应相乘
        '''
        lbox *= self.hyp['box']
        lobj *= self.hyp['obj']
        lcls *= self.hyp['cls']
        bs = tobj.shape[0]  # batch size

        loss = lbox + lobj + lcls
        return loss * bs, torch.cat((lbox, lobj, lcls, loss)).detach()

When anchor returns, xywh is processed as follows:

 # Regression
 pxy = ps[:, :2].sigmoid() * 2. - 0.5
 pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i]

This is consistent with the code in yolo.py Detect:

y = x[i].sigmoid()
y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh

3.2 smooth_BCE

smooth_BCE is a label smoothing strategy (trick), which aims to prevent overfitting. This function modifies the original positive and negative samples 1 and 0 to 1.0 - 0.5 * eps, and 0.5 * eps。 defined in ComputeLoss, and __call__ call

def smooth_BCE(eps=0.1):  # https://github.com/ultralytics/yolov3/issues/238#issuecomment-598028441
    # return positive, negative label smoothing BCE targets
    return 1.0 - 0.5 * eps, 0.5 * eps

3.3 BCEBlurWithLogitsLoss

BCEBlurWithLogitsLoss code is an alternative to the BCE function. It can directly replace the traditional BCE function in the ComputeLoss class__init__

class BCEBlurWithLogitsLoss(nn.Module):
    # BCEwithLogitLoss() with reduced missing label effects.
    def __init__(self, alpha=0.05):
        super().__init__()
        self.loss_fcn = nn.BCEWithLogitsLoss(reduction='none')  # must be nn.BCEWithLogitsLoss()
        self.alpha = alpha

    def forward(self, pred, true):
        loss = self.loss_fcn(pred, true)
        pred = torch.sigmoid(pred)  # prob from logits
        # dx = [-1, 1]  当pred=1 true=0时(网络预测说这里有个obj但是gt说这里没有), dx=1 => alpha_factor=0 => loss=0
        # 这种就是检测成正样本了但是检测错了(false positive)或者missing label的情况 这种情况不应该过多的惩罚->loss=0
        dx = pred - true  # reduce only missing label effects
        # 如果采样绝对值的话 会减轻pred和gt差异过大而造成的影响
        # dx = (pred - true).abs()  # reduce missing label and false label effects
        alpha_factor = 1 - torch.exp((dx - 1) / (self.alpha + 1e-4))
        loss *= alpha_factor
        return loss.mean()

3.4 WordLoss 

The main idea of ​​the FocalLoss loss function is to hope that those hard examples will contribute more to the loss, so that the network will be more inclined to learn from these samples. This prevents too many easy examples from dominating the entire loss function. advantage:

  • Solve the problem of imbalance of positive and negative samples (foreground and background) in images in one-stage object detection;
  • Reduce the weight of simple samples so that the loss function pays more attention to difficult samples;
class FocalLoss(nn.Module):
    # Wraps focal loss around existing loss_fcn(), i.e. criteria = FocalLoss(nn.BCEWithLogitsLoss(), gamma=1.5)
    def __init__(self, loss_fcn, gamma=1.5, alpha=0.25):
        super().__init__()
        self.loss_fcn = loss_fcn  # must be nn.BCEWithLogitsLoss()
        self.gamma = gamma  # 参数gamma  用于削弱简单样本对loss的贡献程度
        self.alpha = alpha  # 参数alpha  用于平衡正负样本个数不均衡的问题
        self.reduction = loss_fcn.reduction
        # focalloss中的BCE函数的reduction='None'  BCE不使用Sum或者Mean
        self.loss_fcn.reduction = 'none'  # required to apply FL to each element

    def forward(self, pred, true):
        loss = self.loss_fcn(pred, true)    # 正常BCE的loss:   loss = -log(p_t)
        # p_t = torch.exp(-loss)
        # loss *= self.alpha * (1.000001 - p_t) ** self.gamma  # non-zero power for gradient stability

        # TF implementation https://github.com/tensorflow/addons/blob/v0.7.1/tensorflow_addons/losses/focal_loss.py
        pred_prob = torch.sigmoid(pred)  # prob from logits
        p_t = true * pred_prob + (1 - true) * (1 - pred_prob)
        alpha_factor = true * self.alpha + (1 - true) * (1 - self.alpha)
        modulating_factor = (1.0 - p_t) ** self.gamma
        # 公式内容
        loss *= alpha_factor * modulating_factor

        if self.reduction == 'mean':
            return loss.mean()
        elif self.reduction == 'sum':
            return loss.sum()
        else:  # 'none'
            return loss

References 

YOLOv5-6.x source code analysis (8)---- loss.py-CSDN blog

 Yolov5 target detection neural network - loss function calculation principle - Zhihu

 YOLOV5 code detailed explanation of calculation of loss function_python_Script Home

Guess you like

Origin blog.csdn.net/qq_40716944/article/details/134322556