Object Detection: RPN — The Backbone of Faster R-CNN

Make a fortune with your little hand, give it a thumbs up!

In object detection using R-CNN, RPN is the real backbone and has been shown to be very effective so far. Its purpose is to propose multiple objects recognizable in a particular image.

This approach was proposed by Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun in a very popular paper "Faster R-CNN: Towards Real Time Object Detection with Region Proposal Networks". This is a very popular algorithm that has attracted the attention of many data scientists, deep learning and artificial intelligence engineers. It has huge applications such as detecting objects in self-driving cars, assisting and helping people of different abilities, etc.

1. What is a CNN?

CNN translates to Convolutional Neural Network, which is a very popular image classification algorithm and usually consists of convolutional layers, activation function layers, pooling (mainly max_pooling) layers to reduce dimensionality without losing a large number of features . For this article [1] , you should know that there is a feature map generated by the last convolutional layer.

For example, if you input a cat image or a dog image, the algorithm can tell you whether it is a dog or a cat.

But it doesn't stop there, massive computing power has led to huge advances.

Many pretrained models were developed to use them directly without going through the pain of training the model due to computational constraints. Many models are also popular such as VGG-16, ResNet 50, DeepNet, AlexNet for ImageNet.

For this particular article, I specifically want to talk about what I think is a really clever algorithm or idea derived from the above paper. Many people implement Faster R-CNN to recognize objects, but this one looks specifically at the logic and math behind how the algorithm gets boxes around recognized objects.

The developers of this algorithm call it Region Proposal Networks, abbreviated as RPN.

To generate these so-called "suggestions" for areas where objects are located, a small network slides over the convolutional feature map, which is the output of the last convolutional layer.

alt

以上是 Faster R-CNN 的架构。 RPN 为对象生成建议。 RPN 本身具有专门且独特的架构。我想进一步分解RPN架构。

alt

RPN 有一个分类器和一个回归器。作者引入了锚点的概念。 Anchor 是滑动窗口的中心点。对于作为 AlexNet 扩展的 ZF 模型,尺寸为 256-d,对于 VGG-16,尺寸为 512-d。分类器确定具有目标对象的提议的概率。回归对提案的坐标进行回归。对于任何图像,比例和纵横比都是两个重要参数。不知道的朋友,纵横比=图片的宽度/图片的高度,scale就是图片的大小。开发人员选择了 3 种比例和 3 种纵横比。因此,每个像素总共可能有 9 个建议,这就是 k 值的决定方式,对于这种情况,K=9,k 是锚点的数量。对于整个图像,anchors 的数量是 WHK。

该算法对平移具有鲁棒性,因此该算法的关键属性之一是平移不变性。

算法中多尺度锚点的存在导致“锚点金字塔”而不是“过滤器金字塔”,这使得它比以前提出的算法(如 Multi-Box)更省时且更具成本效益。

2. 它是如何工作的 ?

这些锚点根据两个因素分配标签:

  1. Intersection-over-union 最高的锚点与地面实况框重叠。
  2. Intersection-Over-Union Overlap 高于 0.7 的锚点。

归根结底,RPN 是一种需要训练的算法。所以我们肯定有我们的损失函数。

alt

i → anchor 的索引,p → 是否是物体的概率,t → 预测边界框的4个参数化坐标的向量,*表示ground truth box。 cls 的 L 表示两个类的对数损失。

alt

损失函数中带有回归项的p确保当且仅当对象被识别为是时,则只有回归才算数,否则p将为零,因此损失函数中的回归项将变为零。

Ncls 和 Nreg 是归一化。默认情况下,λ 默认为 10,用于在同一级别上缩放分类器和回归器。

如果您想更详细地了解,这里是论文的链接:https://arxiv.org/pdf/1506.01497.pdf。

Reference

[1]

Source: https://medium.com/egen/region-proposal-network-rpn-backbone-of-faster-r-cnn-4a744a38d7f9

本文由 mdnice 多平台发布

Guess you like

Origin blog.csdn.net/swindler_ice/article/details/130978360