Faster R-CNN learning

Learn the classic target detection network! Worship big brother! !
Faster R-CNN object is to achieve real-time target detection region candidate network tasks

  1. The current position on the target detection area networks rely candidate object hypothesis algorithm. SPPNet Fast R-CNN and the like to reduce the running time, but this time also so that the calculated area of ​​the bottleneck of the candidate execution time. In this work, we introduce a candidate area network will feature a full picture of the convolution shared with detection network, thus obtaining a candidate area of ​​almost no cost.
  2. RPN is a full convolution network, it can be predicted borders and objectness score at each position at the same time. RPN end of the train so as to produce a high quality region candidate , which is used in the Fast R-CNN for detection. We further combined RPN and fast R-CNN into a single network, characterized by sharing their convolution, which uses attention mechanisms. RPN tell this part of the unified network where to look.
  3. Fast R-CNN network using very deep, to achieve near real-time rates, of course, ignoring the time spent in the region candidate.
  4. In this article, we show a change in the algorithm, is calculated by a deep convolutional neural network candidates, this time, when calculating the detection of a given network, the candidate is almost no computing takes. By sharing convolution at test time, additional costs in the calculation of candidates is small.
  5. On the basis of the convolution Fast R-CNN layer above, we add an additional layer to perform convolution and back boundary region targeted simultaneously at each location on a regular grid.
    RPN is designed to predict the effective region candidates having a wide range of dimensions and the aspect ratio of
    Here Insert Picture Description
    the above-described figures, (A) there is shown in FIG pyramid model image and the feature, in which case the classification runs on all scales, (b) this when the pyramid only run on the feature map, © return to the way of using the reference boxes pyramid.
    (A) referred to the image / feature pyramid, i.e. the use of different scales scales the image, and calculates a feature map for each scale. The effective but time-consuming method
    (b) of FIG feature size is fixed at this time, but the use of multi-scale features on a sliding window of FIG. Typically, this method is (a) in combination
    (c) The methods used herein is to be pyramid-anchor, save more cost-efficient. At this time, for the different anchor box having dimensions and aspect ratios of bbox classification and regression. In this case, the image, wherein FIG., The size of the sliding window is unique.
  6. We introduced the new "anchor" box, they can be used as reference in various proportions and aspect ratio. Our regression scheme can be seen as the reference pyramid (FIG. 1, c), which avoids having a plurality of enumeration or image ratio or aspect ratio of the filter. When a single image scale training and testing, the model performed well, thereby increasing the operating speed.
    In order to unify R-CNN and Fast R-CNN target detection network, we propose a training mode. He is taken up in the fine region candidate tasks for target detection fine adjustment of these two aspects alternately in a fixed candidate. These models converge quickly, at the same time, between the two tasks can be shared convolution characteristics.
    Here Insert Picture Description
  7. Faster R-CNN consists of two modules, the first module is a full convolution depth of the network, to extract the region, the second module is a Fast R-CNN detector, it is proposed to use that area.
  8. Active area candidate networks: the image to an arbitrary size as input, output a series of rectangular target candidate, each candidate has a objectness fraction. Wherein, objectness "object" to measure the relationship between members of a group context object class "objectness" measures membership to a set of object classes vs. background.
    Convolution in order to produce candidate regions, we generate shared by a last convolution layer a small web sliding on the output characteristics of FIG. This small convolution of the input network of n × n FIG feature space as the input window. Each sliding window is mapped to a lower-dimensional feature (ZF to 256-d, VGG to 512-d, followed ReLU [33]).
    This feature is input to two full sisters connection layer: a back layer and a cartridge classification reg layer cls cartridge. Because of this small networks operate on a sliding window, so in all spatial positions are shared fully connected layer .
  9. Anchor: In each of the candidate positions have been proposed up to the k, then there is output a 4k reg layer, cls scores have 2k (using binary article). This k is a candidate of the obtained parameters, called "anchor" for a reference frame k. Here, each anchor point to the center sliding window center, and a corresponding scale and aspect.
    Here Insert Picture Description
  10. anchors shift invariance, from the form of anchors, anchor and calculation functions related to the candidate. When a candidate at an object in the picture, then the candidate should also panning, and the same function can be achieved are forecast to candidates in both positions. Translation invariance also reduce the size of the model.
  11. Multi-scale return of anchors as a frame of reference
  12. Positive and negative samples in the training of RPNs: class labels we assign to each anchor a dichotomous. The following two anchor to impart a positive Tags: gold standard cassette and IOU has the highest overlap; anchor and a gold standard has IOU overlapping ratio higher than 0.7. When the non-positive and anchor the gold standard of all IOU cartridge are lower than 0.3, it gives a negative sample labels. Between positive and negative samples anchor does not participate in training
  13. 多任务损失函数:
    Here Insert Picture Description
    仔细阅读这个损失函数:i表示anchor在mini-batch中的索引号,pi表示anchor i属于是一个对象的预测概率。这个损失函数分为两项,前者为类别loss,后者为回归loss。ti是指bbox的四个顶点构成的向量。且由于定义时,正样本的pi为1,负样本的pi为0,就代表后一项仅对正样本有效。
    且从上述公式中可以知道,本工作用于回归的特征是具有相同的空间尺寸。为了考虑到变化的尺寸,学习了一系列的k个bounding-box回归器。每一个回归器负责一个尺寸和一个纵横比,这k个回归器不共享权值。
  14. 在训练过程中,来源于一个单一图像的每一个mini-batch都包含很多正负样本anchors。但是,由于负样本较多,如果使用所有的anchors来对网络进行训练,将会使得结果偏向于负样本。于是,本文采取的方法是从一幅图像中随机采样256个anchors来计算该mini-batch的损失函数。
    15.有三种方式来训练有特征共享的网络:
    1. 迭代式训练。首先训练RPN,然后使用候选来训练Fast R-CNN,由fast R-CNN调节过的网络然后被用来初始化RPN,这个过程被迭代。该方法在本篇论文中的所有实验中被使用;
    2. 近似联合训练,此时这两个网络在训练过程中合并入一个网络。前向过程产生区域候选,这些候选将被视为固定大小、已经预计算的候选,来训练一个fast R-CNN检测器。在反向传播过程中,共享层的反向传播信号同时来源于RPN损失和Fast R-CNN损失。这个方法很容易实现,但是这种解决方案忽略了候选框坐标的梯度,但是这些梯度也是网络的响应。
    3. 非近似联合训练,此时的RoI池化层对于box坐标是可微的,可以将bbox坐标作为输入,从而可以在反向传播过程中计算关于box坐标的梯度。
  15. 本文提出的训练方案(主要针对如何在Fast R-CNN和RPN间共享卷积层)。首先训练RPN网络;然后,使用由RPN网络生成的候选框来训练一个单独的Fast R-CNN检测网络,到目前为止,两个网络之间并没有共享的卷积层;然后,使用检测网络来初始化RPN的训练,但是此时固定住共享的卷积层,只对专注于RPN的层进行优调;最后,保持共享层固定,只对专注于Fast R-CNN层的卷积层进行优化
  16. Results:
    Here Insert Picture Description
    wherein SS is a selective search, EB is EdgeBoxes
    use with the RPN Fast R-CNN network, to speed up the training speed, and because reducing the number of candidate blocks used, such that it takes the last region-wise fully connected layers also reduces .
    Here Insert Picture Description
    Here Insert Picture Description
Published 31 original articles · won praise 1 · views 1935

Guess you like

Origin blog.csdn.net/weixin_37709708/article/details/103938716