学习Faster R-CNN代码roi_pooling（三）

这一篇单独拿出来了解一下roi_pooling/src/roi_pooling.c中C代码：

说明
我查过一些，但没有查到太多有用的信息，连百度#include <TH/TH.h>都百度不出太多信息，更不知道THFloatTensor_data，THFloatTensor_size具体怎么用。可能我查到的信息还是太少了吧，下面说一下我自己的理解吧，不能保证正确。

1.关于头文件TH/TH.h
#include<TH/TH.h>包括了 pytorch C 代码数据结构和函数的声明，这是pytorch底层接口。

2.roi_pooling_forward的参数

1 int roi_pooling_forward(int pooled_height, int pooled_width, float spatial_scale,
2                         THFloatTensor * features, THFloatTensor * rois, THFloatTensor * output)

pooled_height pooling后的高；
pooled_width pooling后的宽；
spatial_scale 空间尺度，输入图片与feature map之前的比值，这个feature map指roi pooling层的输入；
features 第一个网络卷积后的特征图；
rois 所有感兴趣区域；
output 指的是pooling后的结果？

3.函数里面的变量

1 // Grab the input tensor
2     float * data_flat = THFloatTensor_data(features);
3     float * rois_flat = THFloatTensor_data(rois);
4 
5     float * output_flat = THFloatTensor_data(output);

把这几个参数值提取出来。在C里面就是开辟一块连续的内存来存储这些数据。
THFloatTensor_data作用就是提取值吧。

1 // Number of ROIs
2     int num_rois = THFloatTensor_size(rois, 0);
3     int size_rois = THFloatTensor_size(rois, 1);

根据上面代码rois信息包括num_rois和size_rois，即感兴趣区域的数量和大小（这里的大小指的是roi的大小，准确的说是占据的内存区域）。

 1 // batch size
 2     int batch_size = THFloatTensor_size(features, 0);
 3     if(batch_size != 1)
 4     {
 5         return 0;
 6     }
 7     // data height
 8     int data_height = THFloatTensor_size(features, 1);
 9     // data width
10     int data_width = THFloatTensor_size(features, 2);
11     // Number of channels
12     int num_channels = THFloatTensor_size(features, 3);

features信息包括batch_size，data_height，data_width，num_channels即批尺寸，特征数据高度，特征数据宽度，特征的通道数。

1 // Set all element of the output tensor to -inf.
2     THFloatStorage_fill(THFloatTensor_storage(output), -1);

开始是把所有输出张量的元素设置为负无穷。

接下来就要对每个ROI进行max pool了。

// For each ROI R = [batch_index x1 y1 x2 y2]: max pool over R
    int index_roi = 0;
    int index_output = 0;
    int n;
    for (n = 0; n < num_rois; ++n)

初始化roi索引是0；初始化输出索引是0。然后开始遍历所有的感兴趣区域。

1         int roi_batch_ind = rois_flat[index_roi + 0];
2         int roi_start_w = round(rois_flat[index_roi + 1] * spatial_scale);
3         int roi_start_h = round(rois_flat[index_roi + 2] * spatial_scale);
4         int roi_end_w = round(rois_flat[index_roi + 3] * spatial_scale);
5         int roi_end_h = round(rois_flat[index_roi + 4] * spatial_scale);

上面代码是取出roi的信息，roi_batch_ind，roi_start_w，roi_start_h，roi_end_w，roi_end_h，包括批的索引，ROI左上角和右下角的坐标。

对于每个ROI，从rois_flat中取出索引以及坐标信息，坐标信息乘以spatial_scale是因为这个值是输入图片与feature map之前的比值所以乘上这个比值就是把坐标映射到了原图像上，而不是在featuremap上。映射到原图像时可能不是对齐的，所以这里要四舍五入取整。

1         int roi_height = fmaxf(roi_end_h - roi_start_h + 1, 1);
2         int roi_width = fmaxf(roi_end_w - roi_start_w + 1, 1);
3         float bin_size_h = (float)(roi_height) / (float)(pooled_height);
4         float bin_size_w = (float)(roi_width) / (float)(pooled_width);

得到ROI的高度和宽度，pooling后bin的高和宽，这里bin的长宽是个浮点数，不一定是整数。

1         int index_data = roi_batch_ind * data_height * data_width * num_channels;
2         const int output_area = pooled_width * pooled_height;

index_data指什么？是批索引乘以特征图高度乘以特征图宽度乘以特征图通道数。
output_area是pooling后输出的大小，因为pooling大小是固定的，这个值是不变的。

1        int c, ph, pw;
2         for (ph = 0; ph < pooled_height; ++ph)
3         {
4             for (pw = 0; pw < pooled_width; ++pw)
5             {

上面代码就是进行对每个bin进行pooling了。

1         int hstart = (floor((float)(ph) * bin_size_h));
2         int wstart = (floor((float)(pw) * bin_size_w));
3         int hend = (ceil((float)(ph + 1) * bin_size_h));
4         int wend = (ceil((float)(pw + 1) * bin_size_w));

hstart和wstart是每个bin的在ROI的左上角位置。ceil函数是返回不小于这个数的整数，hend和wend就是bin在ROI的右下角位置。因为是ceil函数，所以左上角的bin不小于右下角的bin。

1         hstart = fminf(fmaxf(hstart + roi_start_h, 0), data_height);
2         hend = fminf(fmaxf(hend + roi_start_h, 0), data_height);
3         wstart = fminf(fmaxf(wstart + roi_start_w, 0), data_width);
4         wend = fminf(fmaxf(wend + roi_start_w, 0), data_width);

hstart、wstart、hend和wend就是返回bin在原图的位置，原本是在ROI中的位置。

 1            int h, w, c;
 2             for (h = hstart; h < hend; ++h)
 3             {
 4                 for (w = wstart; w < wend; ++w)
 5                 {
 6                     for (c = 0; c < num_channels; ++c)
 7                     {
 8                         const int index = (h * data_width + w) * num_channels + c;
 9                         if (data_flat[index_data + index] > output_flat[pool_index + c * output_area])
10                         {
11                             output_flat[pool_index + c * output_area] = data_flat[index_data + index];
12                         }
13                     }
14                 }
15             }

上面循环就是bin的高度嵌套宽度嵌套通道数，然后就取这个bin中的最大值。

1         // Increment ROI index
2         index_roi += size_rois;
3         index_output += pooled_height * pooled_width * num_channels;

当处理完一个ROI之后，更新index_roi和index_output 信息，因为C语言中是连续内存，ROI索引就是加上size_rois即ROI大小，输出索引就是加上pooling后占据的内存大小。

OK，这样就结束了。放个图更直观的看一下：

例图引自博客：https://blog.csdn.net/auto1993/article/details/78514071
ROI pooling example
考虑一个88大小的feature map，一个ROI，以及输出大小为22。这里通道是1。

输入的固定大小的feature map

region proposal 投影之后位置（左上角，右下角坐标）：（0，3），（7，8）。

将其划分为（22）个sections（因为输出大小为22）

上面代码中我注释时说过左上角的bin大于右下角的，但是我找的这个例图中似乎不是哦，这个看在代码中具体怎么规定吧。是这样吗？？？

对每个section做max pooling

下面上完整代码
** ## roi_pooling/src/roi_pooling.c ## **

  1 #include <TH/TH.h>
  2 #include <math.h>
  3 
  4 int roi_pooling_forward(int pooled_height, int pooled_width, float spatial_scale,
  5                         THFloatTensor * features, THFloatTensor * rois, THFloatTensor * output)
  6 {
  7     // Grab the input tensor
  8     float * data_flat = THFloatTensor_data(features);
  9     float * rois_flat = THFloatTensor_data(rois);
 10 
 11     float * output_flat = THFloatTensor_data(output);
 12 
 13     // Number of ROIs
 14     int num_rois = THFloatTensor_size(rois, 0);
 15     int size_rois = THFloatTensor_size(rois, 1);
 16     // batch size
 17     int batch_size = THFloatTensor_size(features, 0);
 18     if(batch_size != 1)
 19     {
 20         return 0;
 21     }
 22     // data height
 23     int data_height = THFloatTensor_size(features, 1);
 24     // data width
 25     int data_width = THFloatTensor_size(features, 2);
 26     // Number of channels
 27     int num_channels = THFloatTensor_size(features, 3);
 28 
 29     // Set all element of the output tensor to -inf.
 30     THFloatStorage_fill(THFloatTensor_storage(output), -1);
 31 
 32     // For each ROI R = [batch_index x1 y1 x2 y2]: max pool over R
 33     int index_roi = 0;
 34     int index_output = 0;
 35     int n;
 36     for (n = 0; n < num_rois; ++n)
 37     {
 38         int roi_batch_ind = rois_flat[index_roi + 0];
 39         int roi_start_w = round(rois_flat[index_roi + 1] * spatial_scale);
 40         int roi_start_h = round(rois_flat[index_roi + 2] * spatial_scale);
 41         int roi_end_w = round(rois_flat[index_roi + 3] * spatial_scale);
 42         int roi_end_h = round(rois_flat[index_roi + 4] * spatial_scale);
 43         //      CHECK_GE(roi_batch_ind, 0);
 44         //      CHECK_LT(roi_batch_ind, batch_size);
 45 
 46         int roi_height = fmaxf(roi_end_h - roi_start_h + 1, 1);
 47         int roi_width = fmaxf(roi_end_w - roi_start_w + 1, 1);
 48         float bin_size_h = (float)(roi_height) / (float)(pooled_height);
 49         float bin_size_w = (float)(roi_width) / (float)(pooled_width);
 50 
 51         int index_data = roi_batch_ind * data_height * data_width * num_channels;
 52         const int output_area = pooled_width * pooled_height;
 53 
 54         int c, ph, pw;
 55         for (ph = 0; ph < pooled_height; ++ph)
 56         {
 57             for (pw = 0; pw < pooled_width; ++pw)
 58             {
 59                 int hstart = (floor((float)(ph) * bin_size_h));
 60                 int wstart = (floor((float)(pw) * bin_size_w));
 61                 int hend = (ceil((float)(ph + 1) * bin_size_h));
 62                 int wend = (ceil((float)(pw + 1) * bin_size_w));
 63 
 64                 hstart = fminf(fmaxf(hstart + roi_start_h, 0), data_height);
 65                 hend = fminf(fmaxf(hend + roi_start_h, 0), data_height);
 66                 wstart = fminf(fmaxf(wstart + roi_start_w, 0), data_width);
 67                 wend = fminf(fmaxf(wend + roi_start_w, 0), data_width);
 68 
 69                 const int pool_index = index_output + (ph * pooled_width + pw);
 70                 int is_empty = (hend <= hstart) || (wend <= wstart);
 71                 if (is_empty)
 72                 {
 73                     for (c = 0; c < num_channels * output_area; c += output_area)
 74                     {
 75                         output_flat[pool_index + c] = 0;
 76                     }
 77                 }
 78                 else
 79                 {
 80                     int h, w, c;
 81                     for (h = hstart; h < hend; ++h)
 82                     {
 83                         for (w = wstart; w < wend; ++w)
 84                         {
 85                             for (c = 0; c < num_channels; ++c)
 86                             {
 87                                 const int index = (h * data_width + w) * num_channels + c;
 88                                 if (data_flat[index_data + index] > output_flat[pool_index + c * output_area])
 89                                 {
 90                                     output_flat[pool_index + c * output_area] = data_flat[index_data + index];
 91                                 }
 92                             }
 93                         }
 94                     }
 95                 }
 96             }
 97         }
 98 
 99         // Increment ROI index
100         index_roi += size_rois;
101         index_output += pooled_height * pooled_width * num_channels;
102     }
103     return 1;
104 }

ref：https://blog.csdn.net/auto1993/article/details/78514071

https://blog.csdn.net/weixin_43872578/article/details/86628515

学习Faster R-CNN代码roi_pooling（三）

猜你喜欢