Target detection--Detailed explanation of the principle of border regression loss function SIoU and code implementation

Bounding box regression loss function

1. SIoU

1.1 Principle

Regarding the IoU loss function, such as ( GIoU , DIoU , CIoU ) does not take into account the direction between the real frame and the predicted frame, resulting in a slow convergence speed. For this, SIoU introduces the vector angle between the real frame and the predicted frame and redefines the correlation Loss function, specifically includes four parts:
(1) Angle cost (Angle cost), defined as follows
insert image description here
Λ = 1 − 2 ∗ sin ⁡ 2 ( arcsin ⁡ ( ch σ ) − π 4 ) = cos ⁡ ( 2 ∗ ( arcsin ⁡ ( ch σ ) − π 4 ) ) \Lambda = 1-2*\sin^2(\arcsin(\frac{c_h}{\sigma}) - \frac{\pi}{4})=\cos(2 *(\arcsin(\frac{c_h}{\sigma}) - \frac{\pi}{4}))L=12sin2(arcsin(pch)4p)=cos(2(arcsin(pch)4p) )
wherech c_hchis the height difference between the real frame and the center point of the predicted frame, σ \sigmaσ is the distance between the real frame and the center point of the predicted frame, in factarcsin ⁡ ( ch σ ) \arcsin (\frac{c_h}{\sigma})arcsin(pch) is equal to the angleα \alphaα
ch σ = sin ⁡ ( α ) \frac{c_h}{\sigma}=\sin(\alpha)pch=sin(α)
σ = ( b c x g t − b c x ) 2 + ( b c y g t − b c y ) 2 \sigma = \sqrt{(b_{c_x}^{gt}-b_{c_x})^2+(b_{c_y}^{gt}-b_{c_y})^2} p=(bcxgtbcx)2+(bcygtbcy)2
c h = max ⁡ ( b c y g t , b c y ) − min ⁡ ( b c y g t , b c y ) c_h = \max(b_{c_y}^{gt}, b_{c_y}) - \min(b_{c_y}^{gt}, b_{c_y}) ch=max(bcygt,bcy)min(bcygt,bcy)

( b c x g t , b c y g t ) (b_{c_x}^{gt}, b_{c_y}^{gt}) (bcxgt,bcygt) is the real box center coordinates( bcx , bcy ) (b_{c_x}, b_{c_y})(bcx,bcy) is the coordinates of the center of the predicted frame, it can be noticed that whenα \alphaα π2 \frac{\pi}{2}2pOr 0, the angle loss is 0, if α < π 4 \alpha < \frac{\pi}{4} during traininga<4p, then minimize α \alphaα , otherwise minimizeβ \betab

(2)Determine the distance cost:
insert image description here
Δ = ∑ t = x , y ( 1 − e − γ ρ t ) = 2 − e − γ ρ x − e − γ ρ y \Delta = \sum_{ t=x,y}(1-e^{-\gamma\rho_t})=2-e^{-\gamma\rho_x}-e^{-\gamma\rho_y}D=t=x,y(1ec rt)=2ec rxec ry
其中:
ρ x = ( b c x g t − b c x c w ) 2 , ρ y = ( b c y g t − b c y c h ) 2 γ = 2 − Λ \rho_x = (\frac{b_{c_x}^{gt} - b_{c_x}}{c_w})^2, \quad \rho_y= (\frac{b_{c_y}^{gt} - b_{c_y}}{c_h})^2 \quad \gamma = 2 - \Lambda rx=(cwbcxgtbcx)2,ry=(chbcygtbcy)2c=2ΛNote
: Here( cw , ch ) (c_w, c_h)(cw,ch) is the width and height of the smallest circumscribed rectangle of the real frame and the predicted frame

(3) Shape cost:
Ω = ∑ t = w , h ( 1 − e − wt ) θ = ( 1 − e − ww ) θ + ( 1 − e − wh ) θ \Omega = \sum_{t=w, h}(1-e^{-w_t})^\theta=(1-e^{-w_w})^\theta+(1-e^{-w_h})^\thetaOh=t=w,h(1ewt)i=(1eww)i+(1ewh)θ
其中:
w w = ∣ w − w g t ∣ max ⁡ ( w , w g t ) , w h = ∣ h − h g t ∣ max ⁡ ( h , h g t ) w_w=\frac{|w-w^{gt}|}{\max(w, w^{gt})}, \quad w_h=\frac{|h-h^{gt}|}{\max(h, h^{gt})} ww=max(w,wgt)wwgt,wh=max(h,hgt)hhgt
( w , h ) (w, h) (w,h) ( w g t , h g t ) (w^{gt}, h^{gt}) (wgt,hg t )are the width and height of the predicted frame and the real frame respectively,θ \thetaθ controls the degree of attention to the shape loss. In order to avoid paying too much attention to the shape loss and reduce the movement of the prediction frame, the author uses a genetic algorithm to calculateθ \thetaθ is close to 4, so the author decides thatθ \thetaTheta parameter range is [2, 6]

(4) IoU loss (IoU cost)
insert image description here
I o U = intersection A union B IoU=\frac{intersection A}{union B}IoU=Union B _Intersection A

In summary, the final SIoU loss function is defined as follows:
L oss SI o U = 1 − I o U + Δ + Ω 2 Loss_{SIoU}=1-IoU+\frac{\Delta + \Omega}{2}LossS I o U=1IoU+2D+Oh

1.2 Code implementation

The code for SIoU is implemented as follows (source Meituan yolov6):

elif self.iou_type == 'siou':
	# SIoU Loss https://arxiv.org/pdf/2205.12740.pdf
	'''
	预测框和真实框坐标形式为xyxy,即左下右上角坐标或左上右下角坐标
	'''
	s_cw = (b2_x1 + b2_x2 - b1_x1 - b1_x2) * 0.5 #真实框和预测框中心点的宽度差
	s_ch = (b2_y1 + b2_y2 - b1_y1 - b1_y2) * 0.5 #真实框和预测框中心点的高度差
	sigma = torch.pow(s_cw ** 2 + s_ch ** 2, 0.5) #真实框和预测框中心点的距离
	sin_alpha_1 = torch.abs(s_cw) / sigma #真实框和预测框中心点的夹角β
	sin_alpha_2 = torch.abs(s_ch) / sigma #真实框和预测框中心点的夹角α
	threshold = pow(2, 0.5) / 2 #夹角阈值
	sin_alpha = torch.where(sin_alpha_1 > threshold, sin_alpha_2, sin_alpha_1) #α大于45°则考虑优化β,否则优化α
	angle_cost = torch.cos(torch.arcsin(sin_alpha) * 2 - math.pi / 2) #角度损失
	rho_x = (s_cw / cw) ** 2 
	rho_y = (s_ch / ch) ** 2
	gamma = angle_cost - 2
	distance_cost = 2 - torch.exp(gamma * rho_x) - torch.exp(gamma * rho_y) #距离损失
	omiga_w = torch.abs(w1 - w2) / torch.max(w1, w2)
	omiga_h = torch.abs(h1 - h2) / torch.max(h1, h2)
	shape_cost = torch.pow(1 - torch.exp(-1 * omiga_w), 4) + torch.pow(1 - torch.exp(-1 * omiga_h), 4) #形状损失
	iou = iou - 0.5 * (distance_cost + shape_cost) #siou

loss = 1.0 - iou

Guess you like

Origin blog.csdn.net/qq_56749449/article/details/125753992