CornerNet: Detecting Objects as Paired Keypoints paper architecture Detailed

论文:CornerNet: Detecting Objects as Paired Keypoints

Code: https://github.com/princeton-vl/CornerNet

NOTE: shape dimension data will be used herein, the TF, pytorch the channel axis = 1 channel referred to this dimension, attention differentiate itself.

First, the main architecture paper model

 

 

 The figure comes from the original paper, Hourglass hourglass model (guess because much like an hourglass model structure and named), in more detail later expanded hourglass model structure the paper, and the artwork will be described hourglass model papers.

CornerNet model architecture

 

 

 

 heat here (after calculating the loss sigmoid activation) is a thermodynamic point of view of a corner, put it bluntly is the probability distribution; tag corresponds to the original thesis of Embedding, is responsible for the upper left corner and lower right corner match, off a partial shift amount. channel in view of the thermal output is 80 because the number of categories of data to 80, off of channel 2 is not difficult to understand, tag 1? Here the key ideas and human detection is the same, except Embedding vector herein is 1-D, is calculated by the following equation regarding losses shown:

  

 

   Where ek is the average of two vectors embedded, embedded vector ETK upper left, lower right EBK embedded vector; Lpull corner point for combining, for separating and Lpush corner, delta 1 we make. There are about sums categories, and our embedded vector is only one dimension, then the summation in which dimension it?

  

 

  As shown above, the mean of EK, the summation is taken loss using mask portion region.

####################################

"#" As references, the reference source blog CornerNet reading notes (a) ------ decoding process

References section shape as the data dimensions in architecture pytorch

First, the theoretical review

CornerNet的总共上下两个分支,每个分支三种输出,分别为:各点为角点的概率,偏移,表示两个点是否为同一个物体的embedding得分

左上角分支:

heat map:tl_heat  [batch,C,H,W]

offset: tl_regr  [batch,2,H,W]

embedding: tl_tag    [batch,1,H,W]

右下角分支:

br_heat,  tl_heat  [batch,C,H,W]

br_regr   [batch,2,H,W]

 br_tag  [batch,1,H,W]

总计tl_heat, br_heat, tl_tag, br_tag, tl_regr, br_regr 六个输出。

解码流程:

1.分别在左上角和右下角的heat map中选出得分最高的前100个点

2.对这100个点进行逐个匹配,生成100*100个候选框

3. 通过匹配的两个点是否为同一类,两个点的embedding得分,以及空间位置(左上角必须比右下角小)这几个条件过滤掉大多数bbox,最终留下1000个候选框输出

注:最终生成的1000的候选框应该是要进行nms处理的,但是作者并未将nms操作写入解码函数.
原文链接:https://blog.csdn.net/goodxin_ie/article/details/90453802

####################################

二、第一层卷积将原始特征图分辨率降低了4倍

  

 

  经过一个7x7的卷积层,共128个过滤器,步长为2,padding=3,得到的输出维度为256x256x128。 

三、本文的主骨架是沙漏模型,而沙漏模型中的主要结构为residual

  

 

 四、沙漏模型在本文中的架构

  

  修改前的原架构为:

  

 

 五、tl_models的结构

  

 

 六、br_models的结构

  

 

 七、模型输出outs的结构

  

 

   这里为了看的清楚将很多张量拆开写了,源码中,tl_heats=[tl_heat1, tl_heat2] ; 其他几个指标类似。

 

以上资源需要可以从百度云盘下载:

  链接:https://pan.baidu.com/s/1PNXtkU1GkepH2NFTWy7Fxw
  提取码:1wpr

Guess you like

Origin www.cnblogs.com/dan-baishucaizi/p/12165865.html