Anchor-free branch target detection: the detection based on the target key points

Anchor-free branch target detection: the key point target detection based (the latest comprehensive network beyond YOLOv3)

Target detection areas recently had a newer direction: the target object detection based on key points. The strategy represents the algorithm: CornerNet and CenterNet. Because I work features real-time requirements of the network is relatively high, so the multi-purpose YoLov3 and its variants. And just this afternoon that, based on CornerNet improved CornerNet-Squeeze actually on the network in real time and accuracy are beyond the YoLov3, I was pretty excited, so take the opportunity to learn under the principle of such detection algorithm.

cornerNer paper links: https://arxiv.org/pdf/1808.01244.pdf
GitHub: https://github.com/umich-vl/CornerNet
CenterNet paper links: https://arxiv.org/abs/1904.08189
GitHub: HTTPS : //github.com/Duankaiwen/CenterNet
CornerNe-Lite paper links:  https://arxiv.org/abs/1904.08900
GitHub:  https://github.com/princeton-vl/CornerNet-Lite

The so-called target detection based on a key point, in fact, is the use of one-stage network will target bounding box detection is a key point (that is, the upper left corner of the bounding box and the lower right corner). By target detection is key pairs, you can eliminate the existing one-stage detection network requires a group of anchors, the recent hot anchor-free also coincide. Next, a brief introduction to both CornetNet and CenterNet feature point-based target detection network. Finally CornerNet-Squeeze be brief!

1.CornerNet 【ECCV2018】

Network CornerNet whole idea is to first feature extraction Hourglass Network via a network, the network will subsequently obtained wherein input to two modules: Top-left Corner poolingand Bottom-right Corner poolingfeature extraction of the key points, the target block for each module will Corner Pooling upper left corner of the key classification point and the lower right corner of the key points ( Heatmaps), and find a pair for each key target ( Embeddings), and reduce back-calculated coordinates of the target bias when the target position (based on offsets). An overall configuration diagram of a network as follows:

Obviously, the core CornerNet of four parts:

  • Corner Pooling two
    lower picture shows a schematic Top-left corner pooling, in order that the key feature characterizing feature can be left and right corners of the target region contained key, the authors proposed the following corner pooling of strategies, such as shown below, key features for the sake of the upper left corner, the lower region of the maximum request of the current maximum of the left area of the key points in the same row and the same column, and adds two maxima key features of the upper left corner is the current position.

  • Heatmaps module
    by module Heatmaps, the network will be a key point for each prediction class belongs, the loss of function of the process are as follows:

The above expressions for the loss of function of the predicted corners (headmaps), and is a modified version of the whole focal loss. The meaning of several parameters: pcij heatmaps showing the predicted value of channel c (category c), (i, j) position, ycij ground truth represents a position corresponding to, N represents the number of targets. ycij = 1 when the loss function readily understood that focal loss, α parameter for controlling weight loss of weight of the sample classification difficulty; ycij equal to the other value indicates (i, j) is not the target corner point class c, fair to say this when ycij should be 0 (most algorithms are handled this), but here ycij not 0, but is calculated based on the ground truth by corner Gaussian distribution, so that the distance ground truth more recent (i, j) points ycij value close to 1, this part of the control parameter β by weight, and this is the difference in focal loss. Why use different weights for different loss functions negative sample point? This is shown in block prediction error will corner point as close to the subject a composition of ground truth ground truth and a larger area of overlap, as shown in FIG.

FIG solid red frame is ground truth; orange circles based on the top left corner of the ground truth, the radius of the bottom right corner point and the set values painted, in accordance with block corner radius within a circle and composition IOU ground truth value greater than 0.7 is set, the data points within the circle is the center out of a two-dimensional Gaussian distribution; white dotted line is a prediction block, it can be seen that two corner points of the frame and the predicted ground truth do not overlap, but the predicted frame to frame the basic goal is therefore useful predictor box, so there must be a certain weight loss return, which is why the loss of function of different reasons for the negative sample points to take weight values of different weights.

  • Embeddings module
    in Headmaps module prediction of key categories is no way to know which of the two key points can constitute a target, how to find a target two key points is the module embedding do.

embedding这部分的训练是通过两个损失函数实现的,etk表示属于k类目标的左上角角点的embedding vector,ebk表示属于k类目标的右下角关键点的embedding vector,ek表示etk和ebk的均值。公式4用来缩小属于同一个目标(k类目标)的两个关键点的embedding vector(etk和ebk)距离。公式5用来扩大不属于同一个目标的两个角点的embedding vector距离。

  • Offsets模块
    该模块主要用于弥补由于网络降采样得到的特征图,在反算关键点原始位置时的精度丢失。如下公式所示,由于向下取整,所以会导致精度丢失,而作者利用L1损失来减少这种精度损失。

最终,如下图所示,上半支路的网络结果如下所示,网络最终是由两条支路组成的。

2.CenterNet【CVPR092109】

CenterNet网络主要是基于CornerNet网络存在的问题,而提出的基于关键点目标检测的网络。其实现了目前为止在one-stage系类算法中最高的MAP。CenterNet的作者发现,CornerNet是通过检测物体的左上角点和右下角点来确定目标,但在此过程中CornetNet使用corner pooling仅仅能够提取到目标边缘的特征,而导致CornetNet会产生很多的误检。基于此,CenterNet利用关键点三元组中心点、左上角关键点和右下角关键点三个关键点而不是两个点来确定一个目标,使得网络能够获取到目标内部的特征。而CornerNet在论文中也说道了,约束其网络性能最重要的部分是关键点的提取,因此CenterNet提出了Center Poolingcascade corner Pooling用来更好的提取本文提出的三个关键点。

  • 三元组预测
    如下图所示,网络通过 cascade corner pooling得到左上角,右下角的关键点类别。并通过center pooling得到中心点的关键点类别。随后通过 offsets 将三个关键点位置尽可能精确的映射到输入图片的对应位置,最后通过 embedings 判断三个点是否属于同一个目标。

    在预测中心点特征时,对每个预测框定义一个中心区域,通过判断每个目标框的中心区域是否含有中心点,若有则保留,并且此时预测框的 confidence 为中心点,左上角关键点和右下角关键点的confidence的平均,若无则去除。而很显然,对于每个预测框的中心区域,我们需要其和预测框的大小进行适应,因为中心区面积过小会使得小尺度的错误预测框无法被去除,而中心区过大会导致大尺度的错误预测框无法被去除。因此作者提出如下策略:

    如上图所示,当预测框的尺寸较大时,我们得到的中心区域面积也会变小,而与之对应的,当预测框的尺寸较小时,中心区域的面积也会变大。

  • Center Pooling

作者基于Corner Pooling的系列思想,提出了Center Pooling的思想,使得网络提取到的中心点特征能够更好的表征目标物体。

一个物体的中心并不一定含有很强的,易于区分于其他类别的语义信息。例如,一个人的头部含有很强的,易于区分于其他类别的语义信息,但是其中心往往位于人的中部。我们提出了center pooling 来丰富中心点特征。上图为该方法原理,center pooling提取中心点水平方向和垂直方向的最大值并相加,以此给中心点提供所处位置以外的信息。这一操作使中心点有机会获得更易于区分于其他类别的语义信息。Center pooling 可通过不同方向上的 corner pooling 的组合实现。一个水平方向上的取最大值操作可由 left pooling 和 right pooling通过串联实现,同理,一个垂直方向上的取最大值操作可由 top pooling 和 bottom pooling通过串联实现,如图6所示。

  • cascade corner Pooling
    作者基于Corner Pooling的系列思想,提出了cascade corner Pooling的思想,使得网络提取到的中心点特征能够更好的表征目标物体。

一般情况下角点位于物体外部,所处位置并不含有关联物体的语义信息,这为角点的检测带来了困难。上图(b) 为传统做法,称为 corner pooling。它提取物体边界最大值并相加,该方法只能提供关联物体边缘语义信息,对于更加丰富的物体内部语义信息则很难提取到。上图©为cascade corner pooling 原理,它首先提取物体边界最大值,然后在边界最大值处继续向内部(图中沿虚线方向)提取提最大值,并与边界最大值相加,以此给角点特征提供更加丰富的关联物体语义信息。Cascade corner pooling 也可通过不同方向上的 corner pooling 的组合实现,如图8 所示,图8展示了cascade left corner pooling 原理。

最终,CenterNet在CornerNet的基础上增加了中心点的预测,以及修改了关键点特征的提取方式,大大减小了网络的误检,并且实现了one-stage系列算法中的最好效果。

3.CornetNet-Lite

普林斯顿大学在4月19号提出了两种更高效的基于关键点的目标检测算法,分别为:CornetNet-SaccadeCornetNet-Squeeze,若将两种策略结合则称为CornerNet-Lite

如上图所示,CornerNet-Squeeze专注于速度,但其在性能和速度上都超越了YOLOv3,而CornerNet-Saccade专注于精度。


如上图所示,我们发现CornetNet-SaccadeCornetNet-Squeeze确实很优秀。

以下是Cver对这两个网络的介绍,个人感觉写的很好,我就不造轮子了:


最终我最感兴趣的网络CornerNet-Squeeze和YOLOv3进行对比,达到了如下图所示的效果。

然而,就在我学习并总结这篇文章的过程中,我发现CornerNet-Squeeze是基于CornerNet改进的,但正如上文中介绍CenterNet的时候提到过的CornerNet所具有的那些弊端,我总觉得CornerNet-Squeeze在误检的部分不一定会很优秀,所以接下来就是看源码阶段了,希望CornerNet-Squeeze能够不负我望哈~

参考文献:
https://mp.weixin.qq.com/s/lk268kc55Lgz1d_21zg26A
https://blog.csdn.net/u014380165/article/details/83032273
https://mp.weixin.qq.com/s/xy1WWl2rNvGAXnqIJCy-Mg

Guess you like

Origin www.cnblogs.com/yumoye/p/11022800.html