Depth articles - CNN convolutional neural network (C) ROI pooling and ROI Align interpolation on

Skip to main content

CNN return convolution neural network directory

Previous: depth articles - CNN convolutional neural network (b)  elaborate pooled (pooling) and anti pooled (unpooling)

Next: depth articles - CNN convolutional neural network (iv)  using tf cnn be mnist handwritten numeric code demo project

 

In this section, elaborate on ROI and interpolation

 

III. About ROI

 

1. ROI pooling (Region Of Interesting pooling, ROI pooling) the region of interest Pooling

(1) Target Detection typical architecture (typical architecture) can generally be divided into two stages:

   ①. Region proposal (regional proposals)

        Given an input image to locate all objects that may exist. The output of this stage should be a series of possible object bounding boxes. These are often referred to as region proposal or region of interest (ROI)

   ②. Final classification (final classification)

         Determining the phase of each region proposal belongs to a certain class or background.

 

. (2) This architecture a number of problems:

   ①. Produce large amounts region proposal will lead to performance problems, it is difficult to achieve real-time target detection.

   ②. In terms of processing speed is suboptimal (suboptimal)

   ③. Unable to do end-to-end training.

(3) Then proposed ROI pooling. ROI pooling layer to achieve a significant acceleration of training and testing, and improve the detection accuracy. This layer has two inputs:

   ①. Obtained from a fixed size convolution kernel specific plurality of pooled network feature map depth

   ②. A representation of all the ROI  \large N \times 5 matrix, which  \large N represents the number of ROI. \large 5 The first column represents the index image, and the remaining four are coordinates of the upper left and lower-right corner.

 

(4). ROI pooling of specific operation

   ①. Based on the input image, the ROI feature map to a map corresponding to the position

   ②. The mapped into different size of the region of interest (which may have the same size) of sections (portions), the same dimensions and number of output sections.

   ③. Max pooling operations performed for each sections.

         This can give the corresponding feature map from a fixed-size blocks of different sizes. It is worth mentioning is that the size of the feature map output does not depend on the size of the ROI and the convolution of the feature map. ROI pooling biggest advantage is that greatly improves the processing speed. 

 

. (5) ROI pooling Summary:

   ①. For target detection task

   ②. Allow the re-use of CNN feature map

   ③. Can significantly accelerate the speed of training and testing

   ④. Form that allows end-to-end training detection system.

 

(6). ROI pooling example

   One  \large 8 \times 8 of the feature map by ROI pooling, the output of a size  \large 2 \times 2 of the feature map.

   ①. Enter the  \large 8 \times 8 size of the feature map

   . ② The region proposal projected to a position above the feature map (the top left, bottom right coordinates): (0, 3), (7, 8)

   ③. Be divided into  \large (2 \times 2) two Sections (because the output size \large 2 \times 2, the size of the division, the request may be output)

   ④. Do max pooling for each section.

        

 

2. ROI Align (Region Of Interesting Align, ROI Align) 

(1) In Pooling the ROI, the ROI is mapped to a different size of the feature map fixed size, which results in twice the quantization error caused by:

   ①. Region proposal is xywh typically fractional, but for convenience it will operate integers.

   ②. Boundary region after integers into  \large k \times k cells, each cell boundary will be integers.

 

. (2) of two integers as follows:

      Indeed, after the above two integers, then the candidate block and has already begun to return the most out of a position a certain deviation, the deviation may affect the accuracy of the detection or segmentation. In the paper, the authors summarize it for the misalignment (misalignment) problem.

 

(3) In order to solve this problem, the introduction of ROI Align.

     ROI Align idea is simple:

       Cancel the quantization operation, hold fractional bilinear interpolation process (we will discuss below) to get hungry image pixel values ​​on the float, so that the entire feature aggregation process into a continuous operation. In the specific arithmetic operation, ROI Align not simply supplement the coordinate points on the boundary of the candidate area, then these coordinate points for pooling, but a redesign process:

   ①. Traversing each candidate area, keep the floating-point boundary not quantify

   ②. The candidate area is divided into  \large k \times k cells, each cell boundaries do not quantized

   ③. Four fixed sampling point calculated coordinate position (in the paper, the authors found that four sampling points are set for optimal performance, even directly set to 1 at almost the same performance), double in each cell the method of linear interpolation to calculate the value of these four positions, and the maximum pool operation.

 

(4). ROI Align example

      下图中虚线部分表示 feature map, 实线表示 ROI,这里将 ROI 切分成 \large 2 \times 2 的单元格。如果采样点数是 4,那首先将每个单元格子均分成四个小方格(如 红线所示),每个小方格中心就是采样点。这些采样点的坐标通常是浮点数,所以要对采样点像素进行双线性插值(如 四个箭头所示),就可以得到该像素点的值了。然后对每个单元格内的四个采样点进行 max pooling,就可以得到最终的 ROI Align 的结果。

        事实上,ROI Align 在遍历取样点的数量上没有 ROI pooling 那么多,但却可以获得更好的性能。这主要归功于解决了 misalignment (未对准) 的问题。

 

四. 插值

 

1. 线性插值

已知数据 \large (x_{0}, y_{0}) 和 \large (x_{1}, y_{1}),要计算 \large [x_{0}, x_{1}] 区域某一位置 \large x 在直线上的 \large y 值。

计算方法很简单,通过斜率相等就可以构建 \large y 和 \large x 之间的关系:

           \large \frac{y - y_{0}}{x - x_{0}} = \frac{y - y_{1}}{x - x_{1}}

           \large y = \frac{x - x_{0}}{x_{1} - x_{0}} \cdot y_{1} + \frac{x_{1} - x}{x_{0} - x} \cdot y_{0}

仔细看就是用 \large x 和 \large x_{0}\large x_{1} 的距离作为一个权重(除以 \large x_{1} - x_{0} 是归一化的作用),用于 \large y_{0} 和 \large y_{1} 的加权。这个思想很重要,因为知道了这个思想,理解双线性插值就非常简单了。

 

2. 双线性插值

双线性插值本质上流是在两个方向上做线性插值。

Suppose conceivable  \large P(x, y) point interpolation may be first in  \large x the direction of  \large Q_{11} and  \large Q_{21} between the linearly interpolated  \large R_{1}, similarly obtained  \large R_{2}. Then  \large y the direction  \large R_{1} and  \large R_{2} linear interpolation can be obtained final  \large P. In fact, knowing this would have understood the meaning of the bilinear interpolation.

Expressed by the formula as follows:

                \large f(R_{1}) \approx \frac{x_{2} - x}{x_{2} - x_{1}} \cdot f(Q_{11}) + \frac{x - x_{1}}{x_{2} - x_{1}} \cdot f(Q_{21})

                \large f(R_{2}) \approx \frac{x_{2} - x}{x_{2} - x_{1}} \cdot f(Q_{12}) + \frac{x - x_{1}}{x_{2} - x_{1}} \cdot f(Q_{22})

Then  \large y linear interpolation direction to give:

                 \large f(P) \approx \frac{y_{2} - y}{y_{2} - y_{1}} \cdot f(R_{1}) + \frac{y - y_{1}}{y_{2} - y_{1}} \cdot f(R_{2})

After finishing, the availability  \large P(x, y) of the results of the relationship.

 

 

                

 

Skip to main content

CNN return convolution neural network directory

Previous: depth articles - CNN convolutional neural network (b)  elaborate pooled (pooling) and anti pooled (unpooling)

Next: depth articles - CNN convolutional neural network (iv)  using tf cnn be mnist handwritten numeric code demo project

Published 42 original articles · won praise 15 · views 2762

Guess you like

Origin blog.csdn.net/qq_38299170/article/details/104203587