ORB：对SIFT或SURF的一种有效选择（ORB: an efficient alternative to SIFT or SURF）

ORB：对SIFT或SURF的一种有效选择

ORB: an efficient alternative to SIFT or SURF

译者：Michael Beechan(陈兵) 重庆理工大学

Ethan Rublee Vincent Rabaud Kurt Konolige Gary Bradski

Willow Garage, Menlo Park, California

引用：Rublee E, Rabaud V, Konolige K, et al. ORB: An efficient alternative to SIFT or SURF[C]// International Conference on Computer Vision. IEEE Computer Society, 2011:2564-2571.

论文下载链接：http://xueshu.baidu.com/s?wd=paperuri:(c7af48f642e947bc7aac7dcdaa29e8b0)&filter=sc_long_sign&sc_ks_para=q%3DORB%3A+An+efficient+alternative+to+SIFT+or+SURF&tn=SE_baiduxueshu_c1gjeupa&ie=utf-8&sc_us=12079940052080624209

摘要：

特征匹配是很多计算机视觉问题的基础，如目标识别或从运动到结构(SfM)。当前方法依靠成本很高的描述子检测和匹配。在本文中，我们基于BRIEF提出了一个非常快的二进制描述子称为ORB，具有旋转不变性和噪声抵抗。我们通过实验证明ORB比SIFT快两个数量级，而在许多情况下表现也是如此。在几个实际应用中测试了效率，包括智能手机上的对象检测和补丁跟踪。

1.引言

SIFT关键点检测器和描述符[17]，虽然十多年前，已被证明在使用视觉特征的许多应用中取得了显着成功，包括目标识别[17]，图像拼接[28]，视觉映射[25]等。然而，它施加了大的计算负担，特别是对于诸如视觉里程计的实时系统，或者对于诸如手机的低功率设备。这驱使了用更低计算成本的密集搜索替换; 可以说，最好的是SURF[2]。还有研究旨在加速SIFT的计算，最显着的是GPU设备[26]。

本文中，我们提出了计算效率高，拥有和SIFT算法匹配性能相同的算法替代SIFT，最图像噪声具有更好的鲁棒性，具有实时性。我们的主要动机是增强许多常见的图像匹配应用，例如，使得无GPU加速的低功耗设备可以执行全景拼接和补丁跟踪，并减少标准PC上基于特征的对象检测的时间。我们的描述符对这些任务（以及比SURF更好）的SIFT也是如此，而速度几乎是两个数量级。

BRIEF来自使用二进制测试来训练一组分类树的研究[4]。一旦对一组500个典型的关键点进行训练，树可以用于返回任意关键点的签名[5]。以类似的方式，我们寻找对方向最不敏感的测试。发现不相关测试的经典方法是主成分分析(PCA); 例如，已经表明，用于SIFT的PCA可以帮助消除大量的冗余信息[12]。然而，二进制测试的可能性空间太大，无法执行PCA，而是使用详尽的搜索。

视觉词汇方法[21,27]使用离线聚类来找到不相关的样本，并可用于匹配。这些技术也可能用于寻找不相关的二进制测试。

最接近ORB的系统是[3]，它提出了一个多尺度的Harris关键点和定向补丁描述符。该描述符用于图像拼接，并显示良好的旋转和尺度不变性。然而，相比我们的方法其计算效率并不高。

3.oFAST：FAST关键点定向

由于它们的计算性能，FAST特征被广泛使用。但是，FAST特征没有定向组件。在本节中，我们添加了一个有效计算的方向。

3.1 FAST检测器

我们从图像中检测FAST点开始。FAST采用一个参数：中心像素与围绕中心的圆环中的像素之间的强度阈值。我们使用FAST-9(圆半径为9)，具有良好的性能。

FAST不会产生角度的测量，我们发现沿边缘具有很大的响应。我们采用Harris角点[11]来整理FAST关键点。对于目标数量N的关键点，我们首先将阈值设置得足够低以获得超过N个关键点，然后根据哈里斯度量进行排序，并选择顶点N个点。

FAST不产生多尺度特征。我们使用图像的尺度金字塔，并在金字塔的每个级别生成FAST角点（由哈里斯过滤）。

3.2 强度重心方向(Orientation by Intensity Centroid)

我们方法使用简单但有效的角点方向的测量，即强度重心[22]。强度质心假定角点的强度偏离其中心，并且该向量可以用于估计取向。 Rosin的定义为：

重心为：

我们可以从角点中心构造一个向量O，对于重心。The orientation of the patch then simply is:

其中atan2是arctan的quadrant-aware version。松香提到，考虑到角点是暗还是亮; 然而，为了我们的目的，我们可以忽略这一点，因为角度测量是一致的，而不管角点类型。

为了改善这种措施的旋转不变性，我们确保使用x和y计算半径r的圆形区域内的力矩。我们经验地选择r作为补丁大小，以便x和y从[-r，r]运行。由于| C | 接近0，测量变得不稳定; 对于FAST角点，我们发现这是很少的情况。

我们将质心法与两种基于梯度的测量BIN和MAX进行了比较。在这两种情况下，在平滑图像上计算X和Y梯度。 MAX选择关键点补丁中最大的梯度; BIN以10度的间隔形成梯度方向的直方图，并选择最大值。 BIN类似于SIFT算法，尽管它只选择一个方向。模拟数据集中的方向（面内旋转加上附加噪声）的方差如图2所示。两个梯度测量都不是非常好的，而质心即使在较大的图像噪声下也能给出均匀的取向。

Figure 2. Rotation measure. The intensity centroid (IC) performs best on recovering the orientation of artificially rotated noisy patches, compared to a histogram (BIN) and MAX method.

4. rBRIEF: Rotation-Aware Brief

在本节中，我们首先介绍一个引导的BRIEF描述符，显示如何有效地计算它，并展示为什么在旋转方面它实际上表现不佳。然后，我们引入一个学习步骤，找到较少关联的二进制测试，推导出更好的描述符r BRIEF，为此，我们提供了与SIFT和SURF的比较。

4.1 BRIEF算子的有效旋转

BRIEF的简短概述

BRIEF描述符[6]是从一组二进制强度测试构建的图像补丁的位串描述。考虑平滑的图像补丁，p。二进制测试τ定义为：

其中p(x)是p在点x上的强度。特征被定义为一个向量的n个二进制测试：

[6]中考虑了许多不同类型的测试分布。在这里，我们使用性能最好的，高斯分布围绕补丁的中心。我们也选择矢量长度n = 256。

在执行测试之前平滑图像很重要。在我们的实现中，使用整体图像实现平滑，其中每个测试点是31×31像素补丁的5×5子窗口。这些是从我们自己的实验中选出的，结果如[6]。

引导的BRIEF

我们想允许BRIEF对于在平面旋转是不变的。BRIEF的匹配性能在平面内旋转超过几度时急剧下降（见图7）。 Calonder [6]建议为每个补丁的一组旋转和视角扭曲计算一个BRIEF描述符，但是这个解决方案显然是昂贵的。一个更有效的方法是根据关键点的方向来引导BRIEF。为每个特征集在位置做n个二进制测试，定义2 x n矩阵：

使用块方向和相应的旋转矩阵，我们构造一个S的引导版本：

现在引导BRIEF算子变为：

我们将角度离散到2π/ 30（12度）的增量，并构建预先计算的BRIEF模式的查找表。只要关键点方向θ在视图中是一致的，则将使用正确的点集合来计算其描述符。

4.2 方差和相关性

BRIEF的一个令人愉快的属性是每个位特征具有很大的方差且平均值接近0.5。图3示出了对于超过100k个采样关键点的256位的典型高斯BRIEF模式的均值扩展。平均值0.5给出了一个位特征的最大样本方差0.25。另一方面，一旦BREIF沿着关键点方向定向给出引导BRIEF，则将意味着转移到更多分布的模式（again，图3）。了解这一点的一个方法是，方向角点关键点对二进制测试呈现出更加均匀的外观。

高方差使特征更具区别性，因为它对输入产生差异性的反应。另一个满意的属性是使测试不相关，因为每个测试都将有助于结果。为了分析BRIEF 向量中测试的相关性和方差，我们研究了BRIEF和引导BRIEF对100k个关键点的响应。结果如图4所示。使用PCA对数据进行处理，我们绘制最高的40个特征值（之后两个描述符收敛）。BRIEF和引导BRIEF展示了高初始特征值，表明二进制测试之间的相关性——基本上所有的信息都包含在前10或15个组件中。然而，由引导BRIEF具有特别低的方差和较低的特征值，因此不具有区别性。显然，BRIEF取决于关键点的随机取向以获得良好的性能。另一个影响引导BRIEF的观点显示在内点和异常值之间的距离分布（图5）。注意到，对于引导BRIEF，异常值的平均值被推至左侧，与内部变量有更多的重叠。

Figure 3. Distribution of means for feature vectors: BRIEF, steered BRIEF (Section 4.1), and r BRIEF (Section 4.3). The X axis is the distance to a mean of 0.5

Figure 4. Distribution of eigenvalues in the PCA decomposition over 100k keypoints of three feature vectors: BRIEF, steered BRIEF (Section 4.1), and r BRIEF (Section 4.3).

Figure 5. The dotted lines show the distances of a keypoint to outliers, while the solid lines denote the distances only between inlier matches for three feature vectors: BRIEF, steered BRIEF (Section 4.1), and r BRIEF (Section 4.3).

4.3 学习良好的二进制特征

为了从方差损失中恢复引导BRIEF，并且为了减少二进制测试之间的相关性，我们开发了一种用于选择一个很好的二进制测试子集的学习方法。一种可能的策略是使用PCA或其他一些降维方法，并从大量二进制测试集开始，识别256个具有高方差且在大型训练集上不相关的新特征。然而，由于新特征是由更多的二进制测试组成的，因此它们的计算效率比引导BRIEF更低。相反，我们搜索所有可能的二进制测试，以找到两者都具有高方差（并且均值接近0.5），以及不相关。

方法如下。我们首先在PASCAL 2006上描述图像集[8]，设定一个大约300k个关键点的训练集。我们还列举了31×31像素块绘制的所有可能的二进制测试。每个测试是块的一对5×5子窗口。如果我们注意到我们的块的宽度为wp = 31，测试子窗口的宽度为wt = 5，那么我们有N = (wp–wt)²个可能的子窗口。我们想从这些中选择两对，所以我们有二进制测试。我们消除重叠的测试，所以我们最终得到M = 205590可能的测试。算法是：

1.对所有训练补丁运行每个测试。

2.按照与0.5的平均距离进行测试，形成矢量T.

3.贪婪搜索：

（a）将第一个测试放入结果向量R中，并将其从T中删除。

（b）从T进行下一次测试，并将其与R中的所有测试进行比较。如果其绝对相关性大于阈值，则将其丢弃; 否则添加到R.

（c）重复上一步骤，直到在R中有256个测试。如果少于256个，提高阈值并重试。

这个算法是贪婪搜索一组不相关的测试，均值接近0.5。结果称为 rBRIEF。rBRIEF在引导BRIEF中的方差和相关性有显着改善（见图4）。PCA的特征值较高，并且快速下降。有趣的是看到算法产生的高方差二进制测试（图6）。在未研究的测试（左图）中存在非常显着的垂直趋势，其高度相关; 学习测试显示出更好的多样性和较低的相关性。

4.4 估计

我们使用两个数据集来评估oFAST和rBRIEF的组合，即ORB：合成的平面旋转和添加了高斯噪声的图像，以及从不同视点捕获的纹理平面图像的真实世界数据集。对于每个参考图像，我们计算了oFAST关键点和rBRIEF特征，每个图像的目标是500个关键点。对于每个测试图像（合成旋转或现实世界的视角变化），我们都这样做，然后执行暴力匹配以找到最佳对应。结果根据正确匹配的百分比与旋转角度给出。

Figure 6. A subset of the binary tests generated by considering high-variance under orientation (left) and by running the learning algorithm to reduce correlation (right). Note the distribution of the tests around the axis of the keypoint orientation, which is pointing up. The color coding shows the maximum pairwise correlation of each test, with black and purple being the lowest. The learned tests clearly have a better distribution and lower correlation.

Figure 7. Matching performance of SIFT, SURF, BRIEF with FAST, and ORB (o FAST +r BRIEF) under synthetic rotations with Gaussian noise of 10.

图7显示了添加10的高斯噪声合成测试集的结果。注意到，标准BRIEF操作符在约10度之后急剧下降。SIFT优于SURF，由于其Haar小波组成，它显示出45度角的量化效应。ORB具有最好的表现，超过70％的inliers。

与SIFT不同，ORB对高斯图像噪声相对免疫。如果我们绘制了inliers的性能与噪声，则SIFT在每增加5个噪声增量的情况下表现出10％的稳定下降。ORB也下降，但速度要低得多（图8）。

Figure 8. Matching behavior under noise for SIFT and r BRIEF. The noise levels are 0, 5, 10, 15, 20, and 25. SIFT performance degrades rapidly, while r BRIEF is relatively unaffected.

Figure 9. Real world data of a table full of magazines and an out-door scene. The images in the first column are matched to those in the second. The last column is the resulting warp of the first onto the second.

为了在现实世界的图像上测试ORB，我们拍摄了两套图像，一幅是我们自己的室内桌子上高纹理的杂志（图9），另一幅是户外场景。数据集具有尺度，视点和照明变化。在这组图像上运行一个简单的inlier/outlier测试，我们测量ORB相对于SIFT和SURF的性能。测试以下列方式进行：

1.选择一个参考角度V0。

2.对于所有的Vi，找到一个单应变换Hi0，映射Vi—>V0。

3.现在用Hi0作为SIFT，SURF，ORB描述子匹配的地面真值。

参考文献：

[1] M. Aly, P. Welinder, M. Munich, and P. Perona. Scaling object recognition: Benchmark of current state of the art techniques. In First IEEE Workshop on Emergent Issues in Large Amounts of Visual Data (WS-LAVD), IEEE International Conference on Computer Vision (ICCV), September

2009. 6

[2] H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. In European Conference on Computer Vision,May 2006. 1, 2

[3] M. Brown, S. Winder, and R. Szeliski. Multi-image matching using multi-scale oriented patches. In Computer Vision and Pattern Recognition, pages 510–517, 2005. 2

[4] M. Calonder, V. Lepetit, and P. Fua. Keypoint signatures for fast learning and recognition. In European Conference on Computer Vision, 2008. 2

[5] M. Calonder, V. Lepetit, K. Konolige, P. Mihelich, and P. Fua. High-speed keypoint description and matching using dense signatures. In Under review, 2009. 2

[6] M. Calonder, V. Lepetit, C. Strecha, and P. Fua. Brief: Binary robust independent elementary features. In In European Conference on Computer Vision, 2010. 1, 2, 3, 5

[7] O. Chum and J. Matas. Matching with PROSAC - progressive sample consensus. In C. Schmid, S. Soatto, and C. Tomasi, editors, Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages 220–226, Los Alamitos, USA, June 2005. IEEE Computer Society. 7

[8] M. Everingham. The PASCAL Visual Object Classes Challenge 2006 (VOC2006) Results. http://pascallin.ecs.soton.ac.uk/challenges/VOC/databases.html.4

[9] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2009 (VOC2009) Results. http://www.pascalnetwork.org/challenges/VOC/voc2009/workshop/index.html. 6, 7

[10] A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie, editors, VLDB’99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK, pages 518–529. Morgan Kaufmann, 1999. 6

[11] C. Harris and M. Stephens. A combined corner and edge detector. In Alvey Vision Conference, pages 147–151, 1988.2

[12] Y. Ke and R. Sukthankar. Pca-sift: A more distinctive representation for local image descriptors. In Computer Vision and Pattern Recognition, pages 506–513, 2004. 2

[13] G. Klein and D. Murray. Parallel tracking and mapping for small AR workspaces. In Proc. Sixth IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR’07), Nara, Japan, November 2007. 1

[14] G. Klein and D. Murray. Improving the agility of keyframe-based SLAM. In European Conference on Computer Vision,2008. 2

[15] G. Klein and D. Murray. Parallel tracking and mapping on a camera phone. In Proc. Eigth IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR’09), Orlando, October 2009. 7

[16] V. Lepetit, F. Moreno-Noguer, and P. Fua. EPn P: An accurate O(n) solution to the pnp problem. Int. J. Comput. Vision, 81:155–166, February 2009. 7

[17] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. 1, 2

[18] Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe LSH: efficient indexing for high-dimensional similarity search. In Proceedings of the 33rd international conference on Very large data bases, VLDB ’07, pages 950–961. VLDB Endowment, 2007. 6

[19] M. Martinez, A. Collet, and S. S. Srinivasa. MOPED: A Scalable and low Latency Object Recognition and Pose Estimation System. In IEEE International Conference on Robotics and Automation. IEEE, 2010. 7

[20] M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP, 2009.6

[21] D. Nist´er and H. Stew´enius. Scalable recognition with a vocabulary tree. In CVPR, 2006. 2, 6

[22] P. L. Rosin. Measuring corner properties. Computer Vision and Image Understanding, 73(2):291 – 307, 1999. 2

[23] E. Rosten and T. Drummond. Machine learning for high-speed corner detection. In European Conference on Computer Vision, volume 1, 2006. 1

[24] E. Rosten, R. Porter, and T. Drummond. Faster and better: A machine learning approach to corner detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 32:105–119, 2010. 1

[25] S. Se, D. Lowe, and J. Little. Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. International Journal of Robotic Research, 21:735–758, August 2002. 1

[26] S. N. Sinha, J. michael Frahm, M. Pollefeys, and Y. Genc. Gpu-based video feature tracking and matching. Technical report, In Workshop on Edge Computing Using New Commodity Architectures, 2006. 1

[27] J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. International Conference on Computer Vision, page 1470, 2003. 2, 6

[28] N. Snavely, S. M. Seitz, and R. Szeliski. Skeletal sets for efficient structure from motion. In Proc. Computer Vision and Pattern Recognition, 2008. 1

[29] G. Wang, Y. Zhang, and L. Fei-Fei. Using dependent regions for object categorization in a generative framework, 2006. 6

[30] A. Weimert, X. Tan, and X. Yang. Natural feature detection on mobile phones with 3D FAST. Int. J. of Virtual Reality, 9:29–34, 2010. 7

ORB：对SIFT或SURF的一种有效选择 （ORB: an efficient alternative to SIFT or SURF）

猜你喜欢

ORB：对SIFT或SURF的一种有效选择（ORB: an efficient alternative to SIFT or SURF）