Several articles take you to understand target detection (1) - RCNN

Get into the habit of writing together! This is the 17th day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the details of the event .

RCNN

Computer vision is an interdisciplinary field, and the advent of convolutional neural networks has once again propelled various computer vision tasks forward. There are also unmanned vehicles that once again put forward new requirements for computer vision tasks.

One of the main tasks in computer vision is target detection, and instance segmentation, pose estimation and tracking based on target detection are also the current mainstream research directions.

object_detection_001.jpeg

The problem of multi-target detection is much more complicated than that of image classification. In image classification problems, there is often only one category of interest in the image, and in the problem of target detection, we do not know how many objects we are interested in in the image, and we not only To judge how many targets to detect, what categories these targets belong to, and also need to give their locations, we need to have bounding boxes to enclose the target objects to represent the position of the target objects.

Although it has been almost 8 years since the RCNN network was proposed in 2014, the current target detection 2-stage network probably still looks for the basic idea of ​​RCNN in this year.

Solve the general idea of ​​target detection

Multi-target detection, that is, in an image, there may be multiple targets (we are interested in) objects. In the target detection task, or multi-target detection, we need to detect the targets that appear in the image, not only to say what is in the image The category of the target belongs to, and its location information is also given. So a relatively simple idea, since we have done a good job in classification, can we transform the target detection problem into a classification problem.

  • Extracting candidate boxes, that is, turning a large image into several small images
  • Then do a classification task on small images

Basic idea of ​​RCNN network

rcnn_architecture.png

  • Extract 2k candidate regions from the input image
  • warped region zooms the candidate region once
  • Compute features via CNN

rcnn_artichecture_002.png

OverFeat was introduced before, and about OverFeat was also introduced before. You can read the video recorded before.

Generation of candidate boxes

Using the selective search algorithm to obtain some original areas through image segmentation, and then combining these original areas through a certain strategy, a hierarchical area result is obtained, and these structures include objects that may be needed.

proposal_generation_001.png

selective search 从概念上来看属于图像分割,通过使用阶层聚类算法,产生物体候选区域,通常会认为同一个物体,其纹理或者颜色具有一定相似性,也就是利用颜色、纹理、大小和形状相似性来选择候选区域。

selective_search_001.png

提取候选框的特征

我们通常利用 selective search 会生成 2k 候选框,然后将这些候选框进行缩放一个统一的大小尺寸(warped region),论文中是 227x227 接着将候选区域输入事先训练好的神经网络获取 4096 维度特征,得到 2000x4096 相当做 2k 分类任务,

将基于卷积神经网络后直接将特征图进行展平后,并没有输入到全连接层,那么 2000 x 4096

分类器

SVM 是 2 分类分类器,这里简单地说一下,之前的确下了很多功夫,今天来看过时的东西,我们还是简单了解一下,时代在变,大家学习重点也在不断迁移,我们今天在数学上,计算机的出现,让我们可以少在计算技巧花一些功夫。2000x4096 特征与 20 个 SVM 组成的权值矩阵 4096x20 相乘,获取 2000x20 为矩阵,在矩阵中,每一列也就是每一类进行非极大值抑制剔除权重重叠候选框,

非极大值抑制(NMS)

可能该模型能够找到同一物体的多个边界盒。非极大值抑制有助于避免对同一实例的重复检测。在我们得到一组相同对象类别的匹配边界框后。按置信度分数对所有边界框进行排序。

非极大值抑制剔除重叠建议框,IoU(Intersection over Union) 也就是两个集合 A 和 B 他们交集比上他们并集。

  • 寻找得分最高的目标
  • 计算其他目标与该目标的 IoU 值
  • 删除所有 IoU 值大于给定阈值的目标

IoU_001.png

回归算法来精修候选框

对 NMS 处理后剩余的建议框进一步筛选。接着分别用 20 个回归器对上述 20 个类别中剩余的建议框进行回归操作,最终得到每个类别的修正后的得分最高的 bounding box,预测得到 bounding box x 方向的偏移量和 y 方向的偏移量,以宽度和高度的缩放因子,

g ^ x = p w d x ( P ) + p x g ^ y = p h d y ( P ) + p y g ^ w = p w exp ( d w ( P ) ) g ^ h = p h exp ( d h ( P ) ) \hat{g}_x = p_w d_x(P) + p_x\\ \hat{g}_y = p_h d_y(P) + p_y\\ \hat{g}_w = p_w \exp( d_w(P) )\\ \hat{g}_h = p_h \exp( d_h(P) )\\

RCNN_bbox_regression.png

Taking no target as a negative sample, not all negative samples are equally difficult to identify, that is, some negative samples are easy to identify for negative sample recognition, such as a negative sample does not contain a part of the target or, if in the negative The example samples include some noise or part of the target, so the negative examples may not be easy

In this way, it is difficult to identify negative samples, which increases the difficulty of identification and easily leads to misidentification.

Guess you like

Origin juejin.im/post/7087499626932076557