Learning Faster R-CNN codes roi_pooling (III)

This is a separate out look roi_pooling / src / roi_pooling.c in C code:

Description
I checked some, but not found much useful information, even Baidu #include <TH / TH.h> Baidu could not have too much information, but do not know THFloatTensor_data, THFloatTensor_size specifically how to use. I found information that may or too little of it, said the following about my own understanding of it, can not guarantee proper.

1. About header TH / TH.h
#include <TH / TH.h> statement pytorch C comprising data structures and functions of the code, which is pytorch underlying interface.

2.roi_pooling_forward parameters

1 int roi_pooling_forward(int pooled_height, int pooled_width, float spatial_scale,
2                         THFloatTensor * features, THFloatTensor * rois, THFloatTensor * output)

After high pooled_height pooling;
width after pooled_width pooling;
ratio before spatial_scale spatial scales, the input image and feature map, this means that the input feature map roi pooling layer;
characteristic features in view of the first network convolution;
ROIs all sense area of interest;
the Output refers to the result of pooling?

3. function inside a variable

1 // Grab the input tensor
2     float * data_flat = THFloatTensor_data(features);
3     float * rois_flat = THFloatTensor_data(rois);
4 
5     float * output_flat = THFloatTensor_data(output);

Put a few parameters extracted. In C there is open up a contiguous memory to store data.
THFloatTensor_data role is to extract the value of it.

1 // Number of ROIs
2     int num_rois = THFloatTensor_size(rois, 0);
3     int size_rois = THFloatTensor_size(rois, 1);

The above information including the code rois num_rois and size_rois, i.e., number and size of the region of interest (here, the size refers to the size roi, accurate to say that the memory area occupied).

 1 // batch size
 2     int batch_size = THFloatTensor_size(features, 0);
 3     if(batch_size != 1)
 4     {
 5         return 0;
 6     }
 7     // data height
 8     int data_height = THFloatTensor_size(features, 1);
 9     // data width
10     int data_width = THFloatTensor_size(features, 2);
11     // Number of channels
12     int num_channels = THFloatTensor_size(features, 3);

information includes features batch_size, data_height, data_width, num_channels i.e. batch sizes, wherein the height data, wherein the data width, wherein the number of channels.

1 // Set all element of the output tensor to -inf.
2     THFloatStorage_fill(THFloatTensor_storage(output), -1);

All output is to start tensor element is set to negative infinity.

Then the max pool will be carried out for each ROI.

// For each ROI R = [batch_index x1 y1 x2 y2]: max pool over R
    int index_roi = 0;
    int index_output = 0;
    int n;
    for (n = 0; n < num_rois; ++n)

Roi initialization index is 0; 0 is output index initialization. Then start traversing all areas of interest.

1         int roi_batch_ind = rois_flat[index_roi + 0];
2         int roi_start_w = round(rois_flat[index_roi + 1] * spatial_scale);
3         int roi_start_h = round(rois_flat[index_roi + 2] * spatial_scale);
4         int roi_end_w = round(rois_flat[index_roi + 3] * spatial_scale);
5         int roi_end_h = round(rois_flat[index_roi + 4] * spatial_scale);

The above is the code of the extracted information roi, roi_batch_ind, roi_start_w, roi_start_h, roi_end_w, roi_end_h, including batch coordinate index, the top left and bottom right ROI.

For each of the ROI, and removed from rois_flat index coordinate information, the coordinate information by spatial_scale because this value is the ratio of the input image with the feature map before it is multiplied by this ratio to the mapped coordinates on the original image, rather than featuremap on. It may not be aligned when the original image is mapped, so here rounding to be rounded.

1         int roi_height = fmaxf(roi_end_h - roi_start_h + 1, 1);
2         int roi_width = fmaxf(roi_end_w - roi_start_w + 1, 1);
3         float bin_size_h = (float)(roi_height) / (float)(pooled_height);
4         float bin_size_w = (float)(roi_width) / (float)(pooled_width);

To give the ROI height and width, the bin Pooling height and width, the length and width of a bin where a float, not necessarily an integer.

1         int index_data = roi_batch_ind * data_height * data_width * num_channels;
2         const int output_area = pooled_width * pooled_height;

What index_data finger? FIG batch indexing is characterized by multiplying height multiplied wherein FIG width times the number of channels wherein FIG.
output_area size is output after pooling, pooling because size is fixed, this value is constant.

1        int c, ph, pw;
2         for (ph = 0; ph < pooled_height; ++ph)
3         {
4             for (pw = 0; pw < pooled_width; ++pw)
5             {

The above code is to be carried out for each bin of pooling.

1         int hstart = (floor((float)(ph) * bin_size_h));
2         int wstart = (floor((float)(pw) * bin_size_w));
3         int hend = (ceil((float)(ph + 1) * bin_size_h));
4         int wend = (ceil((float)(pw + 1) * bin_size_w));

hstart and wstart top left corner of each bin is in ROI. ceil function returns an integer of not less than this number, hend and wend bin is in the bottom right corner position of the ROI. Because ceil function, so the upper left corner of the lower right corner of the bin is not less than bin.

1         hstart = fminf(fmaxf(hstart + roi_start_h, 0), data_height);
2         hend = fminf(fmaxf(hend + roi_start_h, 0), data_height);
3         wstart = fminf(fmaxf(wstart + roi_start_w, 0), data_width);
4         wend = fminf(fmaxf(wend + roi_start_w, 0), data_width);

hstart, wstart, hend and wend is the return bin location in the original image, which was originally location in the ROI.

 1            int h, w, c;
 2             for (h = hstart; h < hend; ++h)
 3             {
 4                 for (w = wstart; w < wend; ++w)
 5                 {
 6                     for (c = 0; c < num_channels; ++c)
 7                     {
 8                         const int index = (h * data_width + w) * num_channels + c;
 9                         if (data_flat[index_data + index] > output_flat[pool_index + c * output_area])
10                         {
11                             output_flat[pool_index + c * output_area] = data_flat[index_data + index];
12                         }
13                     }
14                 }
15             }

The above cycle is the number of bin height nested nested channel width, and then take the maximum value in the bin.

1         // Increment ROI index
2         index_roi += size_rois;
3         index_output += pooled_height * pooled_width * num_channels;

After processing when a ROI, and index_output index_roi update information, since C is contiguous memory, i.e. ROI index size_rois ROI size is to add, after adding pooling output index is occupied by memory size.

OK, this is over. Put a more intuitive look at the map:

FIG embodiment cited blog: HTTPS: //blog.csdn.net/auto1993/article/details/78514071
the ROI Pooling Example
Consider a 88 size feature map, a ROI, and an output size of 22. Here is a channel.

Fixed size input feature map


After the region proposal projection position (upper left corner, lower right corner coordinates) :( 0,3), (7,8).


It is divided into two (22) Sections (since an output size of 22)

I said above code is greater than the upper-left corner of the bin when the lower right corner of the comment, but I find this example does not seem to figure Oh, look at how specific provisions of it in code. Is that right? ? ?

Do max pooling for each section


Following the complete code
** ## roi_pooling / src / roi_pooling.c ## **

  1 #include <TH/TH.h>
  2 #include <math.h>
  3 
  4 int roi_pooling_forward(int pooled_height, int pooled_width, float spatial_scale,
  5                         THFloatTensor * features, THFloatTensor * rois, THFloatTensor * output)
  6 {
  7     // Grab the input tensor
  8     float * data_flat = THFloatTensor_data(features);
  9     float * rois_flat = THFloatTensor_data(rois);
 10 
 11     float * output_flat = THFloatTensor_data(output);
 12 
 13     // Number of ROIs
 14     int num_rois = THFloatTensor_size(rois, 0);
 15     int size_rois = THFloatTensor_size(rois, 1);
 16     // batch size
 17     int batch_size = THFloatTensor_size(features, 0);
 18     if(batch_size != 1)
 19     {
 20         return 0;
 21     }
 22     // data height
 23     int data_height = THFloatTensor_size(features, 1);
 24     // data width
 25     int data_width = THFloatTensor_size(features, 2);
 26     // Number of channels
 27     int num_channels = THFloatTensor_size(features, 3);
 28 
 29     // Set all element of the output tensor to -inf.
 30     THFloatStorage_fill(THFloatTensor_storage(output), -1);
 31 
 32     // For each ROI R = [batch_index x1 y1 x2 y2]: max pool over R
 33     int index_roi = 0;
 34     int index_output = 0;
 35     int n;
 36     for (n = 0; n < num_rois; ++n)
 37     {
 38         int roi_batch_ind = rois_flat[index_roi + 0];
 39         int roi_start_w = round(rois_flat[index_roi + 1] * spatial_scale);
 40         int roi_start_h = round(rois_flat[index_roi + 2] * spatial_scale);
 41         int roi_end_w = round(rois_flat[index_roi + 3] * spatial_scale);
 42         int roi_end_h = round(rois_flat[index_roi + 4] * spatial_scale);
 43         //      CHECK_GE(roi_batch_ind, 0);
 44         //      CHECK_LT(roi_batch_ind, batch_size);
 45 
 46         int roi_height = fmaxf(roi_end_h - roi_start_h + 1, 1);
 47         int roi_width = fmaxf(roi_end_w - roi_start_w + 1, 1);
 48         float bin_size_h = (float)(roi_height) / (float)(pooled_height);
 49         float bin_size_w = (float)(roi_width) / (float)(pooled_width);
 50 
 51         int index_data = roi_batch_ind * data_height * data_width * num_channels;
 52         const int output_area = pooled_width * pooled_height;
 53 
 54         int c, ph, pw;
 55         for (ph = 0; ph < pooled_height; ++ph)
 56         {
 57             for (pw = 0; pw < pooled_width; ++pw)
 58             {
 59                 int hstart = (floor((float)(ph) * bin_size_h));
 60                 int wstart = (floor((float)(pw) * bin_size_w));
 61                 int hend = (ceil((float)(ph + 1) * bin_size_h));
 62                 int wend = (ceil((float)(pw + 1) * bin_size_w));
 63 
 64                 hstart = fminf(fmaxf(hstart + roi_start_h, 0), data_height);
 65                 hend = fminf(fmaxf(hend + roi_start_h, 0), data_height);
 66                 wstart = fminf(fmaxf(wstart + roi_start_w, 0), data_width);
 67                 wend = fminf(fmaxf(wend + roi_start_w, 0), data_width);
 68 
 69                 const int pool_index = index_output + (ph * pooled_width + pw);
 70                 int is_empty = (hend <= hstart) || (wend <= wstart);
 71                 if (is_empty)
 72                 {
 73                     for (c = 0; c < num_channels * output_area; c += output_area)
 74                     {
 75                         output_flat[pool_index + c] = 0;
 76                     }
 77                 }
 78                 else
 79                 {
 80                     int h, w, c;
 81                     for (h = hstart; h < hend; ++h)
 82                     {
 83                         for (w = wstart; w < wend; ++w)
 84                         {
 85                             for (c = 0; c < num_channels; ++c)
 86                             {
 87                                 const int index = (h * data_width + w) * num_channels + c;
 88                                 if (data_flat[index_data + index] > output_flat[pool_index + c * output_area])
 89                                 {
 90                                     output_flat[pool_index + c * output_area] = data_flat[index_data + index];
 91                                 }
 92                             }
 93                         }
 94                     }
 95                 }
 96             }
 97         }
 98 
 99         // Increment ROI index
100         index_roi += size_rois;
101         index_output += pooled_height * pooled_width * num_channels;
102     }
103     return 1;
104 }

ref:https://blog.csdn.net/auto1993/article/details/78514071

https://blog.csdn.net/weixin_43872578/article/details/86628515

Guess you like

Origin www.cnblogs.com/wind-chaser/p/11355002.html