目标检测主要分为两大类算法：一种是one-stage，如yolo、ssd，另一种是two-stage，如R-CNN、Fast R-CNN、Faster R-CNN。这篇文章主要讲的是one-stage中的yolo系列算法，包括yolo v1、yolo v2、yolo v3。

1、什么是IOU

IOU(Intersection over Union):指的是候选框(candidate bound)和原标记框(ground truth bound)的交叠率。它的计算公式如下：

A、B代表两个不同的集合，IOU为集合A和B的交集除以并集。

1）对于一维的数据，假设，[x1, x2]属于集合A的闭区间，[y1, y2]属于集合B的闭区间，画图如下：

绿色为集合A，蓝色为集合B，红色为A∩B，黄色为A∪B。

交集下限：z1 = max(x1, y1)

交集上限：z2 = min(x2, y2)

若z2 - z1<=0说明集合A、B之间没有交集，z2 - z1>0说明A、B之间存在交集

扫描二维码关注公众号，回复： 14694082 查看本文章

2）对于二维数据，A为[x, y, w, h],B为[xx, yy, ww, hh]，其中x，y为A的中心点，w，h为A矩形的宽和高，xx，yy为B的中心点，ww，hh为B矩形的宽和高，画图如下：

交集的宽：w_and = min(x+w/2, xx+ww/2) - max(x-w/2, xx-ww/2)

交集的高：h_and = min(y+h/2，yy+hh/2) - max(y-h/2, yy-hh/2)

当w_and和h_and同时大于0的时候，A和B存在交集

2、YOLO v1

引用paper：

Our system divides the input image into an SxS grid, if the center of an object falls into grad cell, that grid cell is responsible for detecting that object.【图片分为SxS个cell，每个cell负责检测中心点落在cell的物体】

Each bounding box consists of 5 predictions:x, y, w, h and confidence. Each grid cell also predicts C conditional class probabilities.【每个bounding box有五个参数，分别是x, y, w, h以及置信度，每个cell有C个类别的预测值】

网络结构如下（还存在全连接层）：

YOLO将图片划分为7*7个cell，每个cell允许预测出两个边框，所以总共有7*7*2=98个bounding box。每个cell返回一个30维的向量，包含两个bounding box的位置2*4，两个bounding box的置信度以及20个类的概率。置信度代表着是否包含对象以及定位的准确率，最后采用NMS(非极大值抑制)NMS，选择每个cell的bounding box。

loss的计算如下：

loss包含三部分，分别是对定位、置信度、分类的误差计算。

github相关代码：https://github.com/xiongzihua/pytorch-YOLO-v1

3、YOLO v2

YOLO v2是在YOLO v1 的基础上增加了一些tricks。简要说明一下tricks。

1）BN：卷积层后加BN

2）High Resolution Classifier:提高分辨率，输入为448*448

3）Convlution and Anchor box:设置了anchor box（模仿faster R-CNN）

4）Dimension Clusters:维度聚类，因为YOLO v1的定位不准确，所以v2想要模仿faster R-CNN，设置anchor box，但它又不屑于手动设置，所以用k-means进行聚类初始化anchor box，这里的k-means所采用的距离不再是欧式距离，它的度量距离如下：

5）Direct location prediction:

6）Fine-Grained Features:有更好的细粒度，因为v2使用passthrough操作

7）Multi-Scale Training:多尺度训练

github相关代码：https://github.com/ruiminshen/yolo2-pytorch

4、YOLO v3

在yolo的基础上进行优化：

1）class prediction：每个边框都会使用多标记分类来预测可能包含的类，使用单独的逻辑分类器

2）prediction across scale and feature extractor:使用特征金字塔类似的概念从尺寸中提取特征，使用网络网络darknet-53

网络结构如下，13*13有更大的感受野，采用大的anchor box，52*52的感受野较小，采用小的anchor box。

(图片来自：https://blog.csdn.net/leviopku/article/details/82660381)

github相关代码：https://github.com/qqwweee/keras-yolo3

yolo-v3 model代码：

import numpy as np
import tensorflow as tf
from keras import backend as K
from keras.layers import Conv2D, Add, ZeroPadding2D, UpSampling2D, Concatenate, MaxPooling2D
from keras.layers.normalization import BatchNormalization
from keras.models import Model
from keras.regularizers import l2

@wraps(Conv2D)
def DarknetConv2D(*args, **kwargs):
    darknet_conv_kwargs = {'kernel_regularizer':l2(5e-4)}
    darknet_conv_kwargs['padding'] = 'valid' if kwargs.get('strides') = (2,2) else 'same'
    darknet_conv_kwargs.update(kwargs)
    return Conv2D(*args, **darknet_conv_kwargs)

# DBL
def DarknetConv2D_BN_Leaky(*args, **kwargs):
    no_bias_kwargs = {'use_bias':False}
    no_bias_kwargs.update(kwargs)
    return compose(
        DarknetConv2D(*args, **no_bias_kwargs),
        BatchNormalization(),
        LeakyReLU(alpha=0.1)
    )

# resnet n
# DBL+DBL+Add
def restblock_body(x, num_filters, num_blocks):
    x = ZeroPadding((1.0), (1.0))(x)
    x = DarknetConv2D_BN_Leaky(num_filters, (3, 3), strides=(2,2))(x)
    for i in range(num_blocks):
        y = compose(
            DarknetConv2D_BN_Leaky(num_filters//2, (1,1)),
            DarknetConv2D_BN_Leaky(num_filters//2, (3,3))(x)
        )
        x = Add()([x, y])
    return x

def darknet_body(x):
    x = DarknetConv2D_BN_Leaky(32, (3,3))(x)
    x = restblock_body(x, 64, 1)
    x = restblock_body(x, 128, 2)
    x = restblock_body(x, 256, 8)
    x = restblock_body(x, 512, 8)
    x = restblock_body(x, 1024, 4)
    return x

def make_last_layers(x, num_filters, out_filters):
    # 5个DBL
    x = compose(
        DarknetConv2D_BN_Leaky(num_filters, (1,1)),
        DarknetConv2D_BN_Leaky(num_filters, (3,3)),
        DarknetConv2D_BN_Leaky(num_filters, (1,1)),
        DarknetConv2D_BN_Leaky(num_filters, (3,3)),
        DarknetConv2D_BN_Leaky(num_filters, (1,1))
    )(x)
    # 1个DBL+conv
    y = compose(
        DarknetConv2D_BN_Leaky(num_filters*2, (3,3)),
        DarknetConv2D(out_filters, (1,1))
    )(x)
    return x, y

def yolo_body(inputs, num_anchors, num_classes):
    darnet = Model(inputs, darknet_body(inputs))
    x, y1 = make_last_layers(darknet.output, 512, num_anchors*(num_classes+5))
    
    x = compose(
        DarknetConv2D_BN_Leaky(256, (1,1)),
        UpSampling2D(2)
    )(x)
    x = Concatenate()([x, darknet.layers[152].output])
    x, y2 = make_last_layers(x, 256, num_anchors*(num_classes+5))
    
    x = compose(
        DarknetConv2D_BN_Leaky(128, (1,1)),
        UpSampling2D(2)
    )(x)
    x = Concatenate()([x, darknet.layers[192].output])
    x, y3 = make_last_layers(x, 128, num_anchors*(num_classes+5))
    return Model(inputs, [y1, y2, y3])

IOU、YOLO v1、v2、v3学习总结

1、什么是IOU

2、YOLO v1

3、YOLO v2

4、YOLO v3

猜你喜欢