Analysis of YOLOv3 algorithm for target detection

Fundamental

feature network
Insert image description here

Input and output
Input 416 ∗ 416 ∗ 3 416*416*34164163 -sized pictures (not unique, but the picture size must be a multiple of 32), output feature maps of 3 scales, respectively13 ∗ 13 ∗ 255 13*13*2551313255 26 ∗ 26 ∗ 255 26*26*255 2626255 52 ∗ 52 ∗ 255 52*52*255 5252255 , that is divided into13 ∗ 13 13*131313, 26 ∗ 26 26*26 2626, 52 ∗ 52 52*52 5252 grid cells.
Each grid cell generates 3 anchors. Each anchor corresponds to a prediction box. Each prediction box has5 + 80 5 + 805+80个参数, { ( x , y , w , h , c ) , 80   k i n d s   o f   c l a s s } \{(x,y,w,h,c),80 \space kinds \space of \space class\} { x,y,w,h,c,80 kinds of class}

Output analysis
(this picture is taken from Zhihu blogger )
Insert image description here

13 ∗ 13 ∗ 255 13*13*255 1313255 26 ∗ 26 ∗ 255 26*26*255 2626255 52 ∗ 52 ∗ 255 52*52*255 5252255 predict large, medium and small objects respectively.
13 * 13 * 255 13*13*2551313255 is the feature obtained by downsampling 32 times;
26*26*255 26*26*2552626255 is 16 times downsampling and13*13 13*131313 Features obtained by combining one-time upsampling;
52 ∗ 52 ∗ 255 52*52*2555252255 is downsampled 8 times and26 ∗ 26 26*262626 features obtained by combining one-time upsampling;

Positive and negative samples: Positive
samples are anchors whose IOU between the anchor and the real box is greater than the specified threshold, and the maximum IOU;
negative samples are anchors whose IOU between the anchor and the real box is less than the specified threshold.

The loss function
consists of coordinate loss, confidence loss and category loss for positive samples, and confidence loss for negative samples.
λ coord ∑ i = 0 S 2 ∑ J = 0 B 1 i , jobj [ ( bx − bx ^ ) 2 + ( by − by ^ ) 2 + ( bw − bw ^ ) 2 + ( bh − bh ^ ) 2 ] + ∑ i = 0 S 2 ∑ J = 0 B 1 i , jobj [ − log ( pc ) + ∑ i = 1 n BCE ( ci , ci ^ ) ] + λ noobj ∑ i = 0 S 2 ∑ J = 0 B 1 i , jnoobj [ − log ( 1 − pc ) ] \lambda_{coord} \sum_{i=0}^{S^2}\sum_{J=0}^{B}1_{i,j}^{ obj}[(b_x-\hat{b_x})^2+(b_y-\hat{b_y})^2+(b_w-\hat{b_w})^2+(b_h-\hat{b_h})^2 ]\\+\sum_{i=0}^{S^2}\sum_{J=0}^{B}1_{i,j}^{obj}[-log(p_c)+\sum_{i= 1}^{n}BCE(c_i,\hat{c_i})]\\+\lambda_{noobj}\sum_{i=0}^{S^2}\sum_{J=0}^{B}1_ {i,j}^{noobj}[-log(1-p_c)]lcoordi=0S2J=0B1i,jobj[(bxbx^)2+(byby^)2+(bwbw^)2+(bhbh^)2]+i=0S2J=0B1i,jobj[log(pc)+i=1nBCE(ci,ci^)]+ ln oo bji=0S2J=0B1i,jn oo bj[log(1pc)]
S 2 S^2 S2 is the total number of grid cells,BBB is the number of anchors in each grid cell.
The first line calculates the coordinate loss of the positive sample and the coordinate loss of the real frame; the
second line calculates the confidence and category loss of the positive sample,1 i , jobj 1_{i,j}^{obj}1i,jobjIndicates whether it is a positive sample; − log ( pc ) -log(p_c)log(pc) , ifpc p_cpcThe closer it is to 1, then − log ( pc ) -log(p_c)log(pc) is closer to 0; in category loss, for each of the 80 detected categories, a binary entropy loss operation is performed.
The third line is the confidence loss of negative samples,− log ( 1 − pc ) -log(1-p_c)log(1pc) inpc p_cpcThe closer it is to 0, the smaller the formula is, and it is closer to 0

performance
Insert image description here

Guess you like

Origin blog.csdn.net/qq_44116998/article/details/128433551