YOLO V3 Personal Notes

Understanding of Yolo v3 algorithm

Based on the algorithm based on V1 V2, the YOLO V3 algorithm has made the following improvements to increase the performance of the algorithm

Anchor box at multiple scales

    YOLO v3算法分别在三个不同的尺度下各得到9个anchor box,与YOLO V2产生anchor box
  的方法一样,我们同样采用在三个不同尺度下使用k-means算法,一共得到9个anchor box。
  通过这样的方法得到更多大小的anchor box,有利于对于小目标的目标检测

feature extraction

   与YOLO v2采用Darklet-19网络提取特征不同,V3中采用了一个包含53个卷积层的作者
 称为Darknet-53的网络进行特征提取,在该网络的输入图像的尺寸并没有作指定的要求,
 而是只要满足为32的倍数即可。

Combining these two ideas, the process of getting the prediction box during the training process is as follows

     当我们输入一张图片时,我们会在Darknet对图片分别下采样8被,16倍,32倍这三个尺度下
   预测bounding boxes,如当我们输入图片大小为256*256的时候,首先我们在下采样32
   即8*8的特征图时,得
   到一个8*8*3*(5+80)的输出:
                           其中8*8表示grid cell的数目,同理我们可以知道其对于原图的感受野为32*32,3表示预测3个boungding box,5即表示bbox的tx,ty,tw,th和置信度c,80表示对80个类别的条件概率预测、
   同时,我们又将8*8的特征图上采样到16*16与原输入在网络中下采样16倍后的特征图融合
   在此得到一个 16*16*3*(5=80)的预测,再重复操作,得到一个32*32*255的输出

From the above, we can see that for a picture with an input size of 256 256, the number of predicted bboxes I get is (8 8 3+16 16 3+32 32 3), and the receptive field covered by each box is also From 32 32 to 8*8, so it has a good effect on small target detection and multi-target detection.

loss function

     首先我们回顾YOLO v1 v2我们对于每个bbox的标签值为bbox与gt的IOU,
 而再YOLO V3中我们则以一种逻辑回归的思想,其标签值非0或1,我们将所有的预测bbox
 分为以下三类,
           负责预测物体的gird cell中存在的与gt交互比最大的bbox作为正样本,标签值为1
           Iou>0.5但不是上述最大iou的bbox,我们忽略他们
           IOU<0.5的bbox我们作为负样本,其标签值为0.
           同时,在计算正样本的置信度预测误差时我们分别计算其每个类别的预测值与标签的
           二元交叉熵损失。

In summary, the loss function in the YOLO V3 algorithm we obtained is shown in the figure below (the picture comes from Tongji Zihao at station b)
insert image description here

To sum up, the improvement of the YOLO V3 algorithm is mainly reflected in multi-scale, anchors of different sizes and predictions on different layers of the network to obtain bboxes with completely different scales, which greatly increases the number of bboxes
and serves as positive and negative for these bboxes. The processing of samples and the obtaining of the loss function in the algorithm,

Guess you like

Origin blog.csdn.net/qq_45836365/article/details/120939386