yolo v2原理

本文链接： https://blog.csdn.net/cjnewstar111/article/details/94037110

目标检测系列文章
yolo v1原理：https://blog.csdn.net/cjnewstar111/article/details/94035842
yolo v2原理：https://blog.csdn.net/cjnewstar111/article/details/94037110
yolo v3原理：https://blog.csdn.net/cjnewstar111/article/details/94037828
SSD原理：https://blog.csdn.net/cjnewstar111/article/details/94038536
FoveaBox：https://blog.csdn.net/cjnewstar111/article/details/94203397
FCOS：https://blog.csdn.net/cjnewstar111/article/details/94021688
FSAF: https://blog.csdn.net/cjnewstar111/article/details/94019687

基本原理：

yolo v2主要是为了解决yolo v1的精度不高，召回率不高的问题

采用了一些策略来达到以上目的

特征提取网络：

重新设计，使用darknet19，使用BN+Leaky Relu

使用高分辨率进行特征提取网络的分类训练

检测网络：

扩大S*S（yolo v1中是7*7，v2是13*13）

引入卷积提取anchor机制

对anchor使用聚类统计

对中心点使用直接坐标

多特征图拼接

每个anchor预测的bounding box都有一组概率分布，而不是yolo v1那样一个cell才有一组概率分布

网络结构

darknet19分类网络与yolo v2检测网络：

实现细节

多特征图拼接的实现

采用reorg层和route层。这里ReorgLayer层就是将26∗26∗512的张量中26∗26切割成4个13∗13，然后连接起来，使得原来的512通道变成了2048。route就是concat层，将reorg输出的和主干网络输出的进行通道上的拼接

anchors的含义

anchors的数值，表示的是相对于cell边长的比例，例如其中一组anchor ： [1.08 1.19]，表示的是宽度是cell边长的1.08倍，高度是cell编程的1.19倍。如下图所示，假设输入图片大小为64*64，最终的feature map为缩小32倍之后的2*2

anchor机制与faster rcnn anchor机制的区别

anchor的确定：

yolo v2但是使用聚类统计的方法，从训练数据集中自动统计出5类anchor，faster rcnn使用先验手工设计

中心点位置的表示：

在faster rcnn中，中心位置是使用（预测框中点-anchor中心） / anchor框宽高的方式表示，如下图所示：

所以作者还是使用了和yolo v1一样的方法，相对于cell左上角的偏移来表示中点坐标，叫做direct location prediction

loss求解：

基本上是按照yolo v1的loss进行。但是对于宽和高的表示，已经区别于yolov1中的宽高表示，而是使用和RPN网络的表示方法。预测框的宽高/anchor的宽高，然后取log来表示

数据增强：

保留v1数据增强的策略的同时，增加了图片180°反转和多尺度训练

参考资料：

YOLOV2 论文理解

yolo9000 ： better，faster，stronger

YOLO V2 代码分析《https://www.cnblogs.com/demian/p/9252038.html》

<https://www.cnblogs.com/demian/p/9252038.html>

<https://zhuanlan.zhihu.com/p/25167153>

猜你喜欢