Depth study notes (40) YOLO

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/qq_32618327/article/details/99200411


1. YOLO object detection algorithm

Before've learned most of the component object detection algorithm:

Hirofumi Package effect
Depth study notes (32) targeting target setting Mark the target location
Depth study notes (33) feature point detection Feature point detection Setting a feature point
Depth study notes (34) Target Detection Sliding window object detection Target detected in the picture
Convolution depth study notes (35) sliding window implementation Convolution Reduce computing costs
Depth study notes (36) bounding box forecast YOLO Assigned to the output object comprising a grid midpoint most accurate bounding box
Depth study notes (37) and the ratio of the cross And pay more than IoU Evaluation of object detection algorithms
(38) Non-maxima suppression depth study notes Non-maximal suppression Ensure that the algorithm detects only once for each object
Depth study notes (39) Anchor Boxes Anchor Boxes Ensure a lattice algorithm can detect multiple objects

YOLO object is now put all the components assembled together and constituting mentioned before detection algorithm optimization


2. The structure of the training set

Take a look at how to construct training set
Here Insert Picture Description
Suppose you want to train an algorithm to detect three kinds of objects, pedestrians, cars and motorcycles
also need to explicitly specify the full background category

There are three categories of labels, use two anchor box if
the output y is 3 × 3 × 2 × 8
wherein represents a 3 × 3 3 × 3 grids, the number of anchor box 2, the vector dimension is 8
8 practical the first. 5 (P C , B X , B Y , B H , B W ) plus the number of categories. 3 (C . 1 , C 2 , C . 3 )
it can be seen as a 3 × 3 × 2 × 8, or 3 × 3 × 16
to construct the training set need to traverse the grid 9 and constituting the corresponding target vector y

Here Insert Picture Description
FIG anchor box 1 (No. 4), anchor box 2 (No. 5) FIG.

First look at the first grid (No. 1)
there is nothing valuable things, pedestrians, cars and motorcycles, three categories are not found
so that the corresponding lattice target y = [0???? ??? 0?? ?????] T
first anchor box of the p- c is 0, because the first anchor box and nothing related to
the second anchor box of the p- c is 0, and the rest of these values do not care-s

现在网格中大多数格子都是空的
但那里的格子(编号2)会有这个目标向量y,y=[0 ? ? ? ? ? ? ? 1 bx by bh bw 0 1 0]T
所以假设训练集中,对于车子有这样一个边界框(编号3),水平方向更长一点

然后红框和anchor box 2的交并比更高
那么车子就和向量的下半部分相关

要注意,这里和anchor box 1有关的 pc 是0,剩下这些分量都是don’t care-s
然后第二个 pc=1,然后要用这些(bx,by,bh,bw)来指定红边界框的位置
然后指定它的正确类别是2(c1=0,c2=1,c3=0),这是一辆汽车

Here Insert Picture Description
所以这样遍历9个格子,遍历3×3网格的所有位置,会得到这样一个向量,得到一个16维向量
所以最终输出尺寸就是3×3×16
和之前一样,简单起见,在这里用的是3×3网格,实践中用的可能是19×19×16
或者需要用到更多的anchor box,可能是19×19×5×8,即19×19×40,用了5个anchor box
这就是训练集,然后训练一个卷积网络,输入是图片,可能是100×100×3
然后卷积网络最后输出尺寸,例子中是3×3×16或者3×3×2×8


3. 预测

接下来算法是怎样做出预测的

输入图像,神经网络的输出尺寸是这个3××3×2×8,对于9个格子,每个都有对应的向量
对于左上的格子(编号1),那里没有任何对象
那么希望神经网络在那里(第一个pc)输出的是0,这里(第二个pc)是0
然后输出一些值,神经网络不能输出问号,不能输出don’t care-s,剩下的输入一些数字
但这些数字基本上会被忽略,因为神经网络告诉你,那里没有任何东西
所以输出是不是对应一个类别的边界框无关紧要
所以基本上是一组数字,多多少少都是噪音(输出 y 如编号3所示)。

Here Insert Picture Description
和这里的边界框不大一样,希望y的值
那个左下格子(编号2)的输出y(编号4所示),形式是
对于边界框1来说(pc)是0,然后就是一组数字
就是噪音,anchor box 1对应行人
此格子中无行人,pc=0,bx=?,by=?,bh=?,bw=?,c1=?c2=?,c3=?

希望算法能输出一些数字
可以对车子指定一个相当准确的边界框,anchor box 2对应汽车
此格子中有车,pc=1,bx,by,bh,bw,c1=0,c2=1,c3=0
这就是神经网络做出预测的过程


3. 非极大值抑制

最后要运行一下非极大值抑制
看看一张新的测试图像
这就是运行非极大值抑制的过程

If two anchor box
then for nine grid in any forecast, there will be two bounding box, which is a probability p c is low
but nine grid, each with two predicted bounding box

Here Insert Picture Description
For example, to get the bounding box of the figure, note there are some bounding boxes may exceed the height and width of the host lattice (No. 1)
Next to abandon low probability predictions, remove these neural networks and say, here is what might have not
it is necessary to discard these (shown in No. 2)

Finally, if there are three categories of object detection, desirable to detect pedestrians, cars and motorcycles
then do is run separately for each category of non-maxima suppression processing predictors Category bounding box of
inhibiting non-maximum value processing category pedestrian, car category with a non-suppression process maximum, and then the motorcycle category of non-maxima suppression
run three times to obtain the final prediction result of
the output of the algorithm is preferably capable of detecting an image in all vehicles, as well as All pedestrian (No. 3)

This is YOLO object detection algorithm, which is actually one of the most effective object detection algorithm
contains the entire visual field of literature object detection computer in many of the most sophisticated ideas


reference:

"Neural networks and deep learning" video course


related suggestion:

Depth study notes (39) Anchor Boxes
depth study notes (38) Non-maxima suppression
depth study notes (37) and cross over
the boundary block prediction depth study notes (36)
depth study notes (35) sliding window Convolution


Thank you!

Guess you like

Origin blog.csdn.net/qq_32618327/article/details/99200411