Dex-Net 1.0 论文翻译

一、概述

DEX-NET1.0（Dex-Net），一种新的数据集和相关算法，以研究大数据和云计算对鲁棒抓握规划的扩展效应。该算法使用具有相关奖励的多臂老虎机模型来利用当前包含超过10,000个独立3D物体模型和250万个平行颚爪抓取方式的不断增长的数据集中的先验抓取方式和3D物体模型。每个抓取方式包括在物体和夹持器姿态和摩擦不确定性下的力闭合概率的估计。 Dex-Net 1.0使用多视图卷积神经网络（MV-CNN），一种用于3D物体分类的新的深度学习方法，作为物体之间相似度的度量。并使用Google Cloud Platform，可同时运行多达1,500个虚拟机，减少实验运行时间三个数量级。实验表明，1，先验数据可以加速鲁棒抓取规划，平均达到2倍。2，抓取规划的质量随着数据集中类似物体的数量增加而增加。3，我们还研究系统对不同相似性和不同姿态和摩擦不确定度水平的敏感度。

二、定义与问题描述

1、一次抓取方式的参数化表示和物体参数化
这里写图片描述
如图1（左）所示：一次抓取方式表示为 g=(x,v)，其中x为3D空间中的爪子形心的表示，v为进近方向。
物体参数化表示：使用一个有符号的距离函数（SDF）f 作为物体的标准模型。其中单位为米，物体内部表示为负数，表面表示为0，外部表示为正数。
2、系统误差模型
系统使用高斯分布作为物体姿态，抓钩姿态，摩擦系数的误差分布函数。
其中物体姿态和抓钩姿态的为0均值的高斯分布模型。摩擦系数为均值为 u 的高斯分布。
3、接触模型
如图1（右）所示：每次接触会有两个接触点 C1,C2 设抓钩间距为w。
每个接触点表示为：这里写图片描述
其中：

接触点法线向量表示为：
接触点正切向量为：
接触点处施加到物体上的力Fi为：
4、质量度量
论文中采用闭合力的概率（PF），或抵抗任意方向上外力和扭矩的能力，作为质量度量。
5、目标
对一个物体的Ng个候选抓取规划进行最多T次采样，通过多臂老虎机算法寻找一个拥有最大化 PF 的抓取规划 g*。

三、DEX-NET

1、3D物体模型
13,252 3D mesh models
8,987 from the SHREC 2014 challenge dataset ,
2,539 from ModelNet40 ,
1,371 from 3DNet ,
129 from the KIT object database,
120 from BigBIRD,
80 from the Yale-CMU-Berkeley dataset ,
26 from the AmazonPicking Challenge scans
采用以上数据集中的数据模型，并对每个模型进行尺度变换，坐标表示，及SDF函数参数化表示后作为DEX-NET1.0的数据集模型。
2、抓取方式采样
在DEX-NET中每个3D物体被标注着相应250种抓取方式和相应的PF。
首先为每个物体生成Ng个抓取方式，然后使用Smith 论文里的2D算法集中其中对称的样本。
1）首先通过对S均匀采样产生一个接触点C1 ，然后随机抽取方向这里写图片描述 ,最后计算接触点C2 和，这便生成了一个抓取方式。（其中S是物体模型表面的点集）
2)如果接触点满足下式我们就把这个抓取方式添加到候选集合里。

3)使用Monte-Carlo integration [20]论文方法来评估PF（g）。
3、抓取方式的差分高度图特征
为了测量DEX-NET数据集中每个物体上不同抓取方式的相似性，我们根据接触点局部表面取向的2D投影，将每个抓取方式嵌入到一个特征空间里面。
这里写图片描述
1）局部表面的高度图

其中u,v为接触点处的像素点位置，为求解像素点位置的方程。
2）接触点的特征向量表示

其中为每个接触点高度图在x,y方向的梯度。

四、深度学习与物体相似度

通过把物体嵌入到用距离表示物体相似度的向量空间中，使用多视图卷积神经网络（MV-CNN）计算全局相似度高效的索引DEX-NET中的先验3D物体和抓取方式。
这里写图片描述
1、首先渲染物体模型，用Nc=50个虚拟相机视角以一定角度半径为R排开朝向物体中心进行渲染。
2、使用AlexNet架构训练一个CNN，以便在一组训练集上预测渲染图像的3D物体的类标签。
3、把Nc个视图输入到CNN中，并降采样fc7层响应输出。
4、使用PCA算法将降采样的输出从4096维降低到100维，生成每个物体的表示：这里写图片描述
5、将每个物体的表示输入到MV-CNN中，通过物体之间的欧几里得距离来测物体Oi与物体Oj之间的相似度。
在论文中为了有效查找相似的物体，Dex-Net包含一个KD-Tree最近邻查询结构，其中包含所有先验物体的特征向量。
试验采用SHREC2014中的6000个3D模型的渲染图象训练MV-CNN ，并使用SHREC2014挑战数据集测试，得到了86.7%的准确率。

五、组合多臂老虎机算法

DEX-NET1.0使用带奖励多臂老虎机算法和DEX-NET1.0中的先验计算，在一个新的测试物体的一组候选抓取方式中优化PF。
1）首先采用三-2中方法生成一组候选抓取方式Γ，并使用DEX-NET数据集为每一个抓取方式预测一个置信分布。
2）使用Thompson Sampling 选择一个候选抓取规划运行多臂老虎机模型，对被选出的抓取规划 g 采样物体姿态，摩擦系数等高斯分布变量计算相应的 F , 根据 F 更新相应抓取规划的置信分布参数。（迭代 “步骤2 ” T次）
3）最后根据每个抓取方式中的PF的保守估计去排序Γ中的抓取方式。
A、置信分布模型
令O表示使用Dex-Net算法标记的测试对象，用Γ表示在物体O上产生的Ng个候选抓取方式的集合。
定义这里写图片描述作为候选抓取方式集中每一个候选抓取方式的force clourse 的评估。
在这个模型中Fj是一个伯努利随机变量，成功率为未知变量。
因为是未知的，该算法在伯努利参数上保持后验Beta置信分布,即通过每一个新的对 F的观察，为真实的PF分配越来越高的概率。
其中Beta分布由这里写图片描述形状参数指定，为，Z（）为归一化常数。
B、置信分布的形成：
使用Continuous Correlated Beta Processes (CCBPs)，为不同物体抓取方式之间的相关性进行建模，这使得我们可以利用DEX-NET中的先验抓取方式和物体数据。
1）,CCBP算法首先使用这里写图片描述评估一个“抓取方式-物体”对的形状参数。
2),使用归一化内核函数测量“抓取方式—物体”对之间的相似度。（1，代表相似，0，代表不相似）
归一化内核函数表达式为：

其中：

是为了获取抓取参数相似性。
这里写图片描述
是“三3”节中得到的每个抓取点的特征向量表示，以获取抓取点的相似性。

是第四节中的物体全局信息表示，以获得物体之间的相似度。

Cm为带宽倒数，其中带宽是根据一个训练集数据上的CCBP获得的真实的PF的最大对数似然度设置的。

我们使用论文[11]中内核函数（如下）测量的候选抓取方式集中每一个抓取方式与DEX-NET1.0数据集 D 中所有抓取方式和物体的相似度，为候选抓取方式集中每一个抓取方式形成一个置信分布。
这里写图片描述
其中是Beta分布中的先验参数，Ns是DEX-NET 1.0数据库中每一个抓取方式被用来评估PF的采样次数。

在实践中我们使用物体相似度KD-Tree中对于O的Nn个最近邻物体去估计上述和,通过观察迭代次数为t时一个抓取方式这里写图片描述的,来更新物体O上其他抓取方式的置信度。

参考文献：

[1] M. Aubry and B. Russell, “Understanding deep features with computer-
generated imagery,” arXiv preprint arXiv:1506.01151, 2015.
[2] T. D. Barfoot and P. T. Furgale, “Associating uncertainty with three-
dimensional poses for use in estimation problems,” Robotics, IEEE
Transactions on, vol. 30, no. 3, pp. 679–693, 2014.
[3] C. Batty, “Sdfgen,” https://github.com/christopherbatty/SDFGen.
[4] J. Bohg, A. Morales, T. Asfour, and D. Kragic, “Data-driven grasp
synthesisa survey,” Robotics, IEEE Transactions on, vol. 30, no. 2, pp.
289–309, 2014.
[5] A. M. Bronstein, M. M. Bronstein, L. J. Guibas, and M. Ovsjanikov,
“Shape google: Geometric words and expressions for invariant shape
retrieval,” ACM Transactions on Graphics (TOG), vol. 30, no. 1, p. 1,
2011.
[6] P. Brook, M. Ciocarlie, and K. Hsiao, “Collaborative grasp planning
with multiple object representations,” in Proc. IEEE Int. Conf. Robotics
and Automation (ICRA). IEEE, 2011, pp. 2851–2858.
[7] B. Calli, A. Walsman, A. Singh, S. Srinivasa, P. Abbeel, and
A. M. Dollar, “Benchmarking in manipulation research: The ycb
object and model set and benchmarking protocols,” arXiv preprint
arXiv:1502.03143, 2015.
[8] D.-Y. Chen, X.-P. Tian, Y.-T. Shen, and M. Ouhyoung, “On visual
similarity based 3d model retrieval,” in Computer graphics forum,
vol. 22, no. 3. Wiley Online Library, 2003, pp. 223–232.
[9] R. Detry, C. H. Ek, M. Madry, and D. Kragic, “Learning a dictionary
of prototypical grasp-predicting parts from grasping experience,” in
Robotics and Automation (ICRA), 2013 IEEE International Conference
on. IEEE, 2013, pp. 601–608.
[10] R. Detry, D. Kraft, O. Kroemer, L. Bodenhagen, J. Peters, N. Kr ¨ uger,
and J. Piater, “Learning grasp affordance densities,” Paladyn, Journal
of Behavioral Robotics, vol. 2, no. 1, pp. 1–17, 2011.
[11] R. Goetschalckx, P. Poupart, and J. Hoey, “Continuous correlated beta
processes,” in IJCAI Proceedings-International Joint Conference on
Artificial Intelligence, vol. 22, no. 1. Citeseer, 2011, p. 1269.
[12] C. Goldfeder and P. K. Allen, “Data-driven grasping,” Autonomous
Robots, vol. 31, no. 1, pp. 1–20, 2011.
[13] C. Goldfeder, M. Ciocarlie, H. Dang, and P. K. Allen, “The columbia
grasp database,” in Robotics and Automation, 2009. ICRA’09. IEEE
International Conference on. IEEE, 2009, pp. 1710–1716.
[14] A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen,
R. Prenger, S. Satheesh, S. Sengupta, A. Coates, et al., “Deep-
speech: Scaling up end-to-end speech recognition,” arXiv preprint
arXiv:1412.5567, 2014.
[15] A. Herzog, P. Pastor, M. Kalakrishnan, L. Righetti, J. Bohg, T. Asfour,
and S. Schaal, “Learning of grasp selection based on shape-templates,”
Autonomous Robots, vol. 36, no. 1-2, pp. 51–65, 2014.
[16] M. W. Hoffman, B. Shahriari, and N. de Freitas, “Exploiting correlation
and budget constraints in bayesian multi-armed bandit optimization,”
arXiv preprint arXiv:1303.6746, 2013.
[17] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for
fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.
[18] D. Kappler, J. Bohg, and S. Schaal, “Leveraging big data for grasp
planning,” in Proc. IEEE Int. Conf. Robotics and Automation (ICRA),
2015.
[19] A. Kasper, Z. Xue, and R. Dillmann, “The kit object models database:
An object model database for object recognition, localization and
manipulation in service robotics,” The International Journal of Robotics
Research, vol. 31, no. 8, pp. 927–934, 2012.
[20] B. Kehoe, D. Berenson, and K. Goldberg, “Toward cloud-based grasping
with uncertainty in shape: Estimating lower bounds on achieving force
closure with zero-slip push grasps,” in Proc. IEEE Int. Conf. Robotics
and Automation (ICRA). IEEE, 2012, pp. 576–583.
[21] B. Kehoe, A. Matsukawa, S. Candido, J. Kuffner, and K. Goldberg,
“Cloud-based robot grasping with the google object recognition
engine,” in Robotics and Automation (ICRA), 2013 IEEE International
Conference on. IEEE, 2013, pp. 4263–4270.
[22] B. Kehoe, S. Patil, P. Abbeel, and K. Goldberg, “A survey of research on
cloud robotics and automation,” Automation Science and Engineering,
IEEE Transactions on, vol. 12, no. 2, pp. 398–409, 2015.
[23] J. Kim, K. Iwamoto, J. J. Kuffner, Y. Ota, and N. S. Pollard, “Physically-
based grasp quality evaluation under uncertainty,” in Proc. IEEE Int.
Conf. Robotics and Automation (ICRA). IEEE, 2012, pp. 3258–3263.
[24] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in neural
information processing systems, 2012, pp. 1097–1105.
[25] O. Kroemer, R. Detry, J. Piater, and J. Peters, “Combining active
learning and reactive control for robot grasping,” Robotics and
Autonomous Systems, vol. 58, no. 9, pp. 1105–1116, 2010.
[26] M. Laskey, J. Mahler, Z. McCarthy, F. Pokorny, S. Patil, J. van den
Berg, D. Kragic, P. Abbeel, and K. Goldberg, “Multi-arm bandit models
for 2d sample based grasp planning with uncertainty.” in Proc. IEEE
Conf. on Automation Science and Engineering (CASE). IEEE, 2015.
[27] I. Lenz, H. Lee, and A. Saxena, “Deep learning for detecting robotic
grasps,” The International Journal of Robotics Research, vol. 34, no.
4-5, pp. 705–724, 2015.
[28] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of
deep visuomotor policies,” arXiv preprint arXiv:1504.00702, 2015.
[29] B. Li, Y. Lu, C. Li, A. Godil, T. Schreck, M. Aono, M. Burtscher,
Q. Chen, N. K. Chowdhury, B. Fang, et al., “A comparison of 3d
shape retrieval methods based on a large-scale benchmark supporting
multimodal queries,” Computer Vision and Image Understanding, vol.
131, pp. 1–27, 2015.
[30] J. Mahler, S. Patil, B. Kehoe, J. van den Berg, M. Ciocarlie,
P. Abbeel, and K. Goldberg, “Gp-gpis-opt: Grasp planning under shape
uncertainty using gaussian process implicit surfaces and sequential
convex programming,” 2015.
[31] D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural
network for real-time object recognition,” in Proc. IEEE/RSJ Int. Conf.
on Intelligent Robots and Systems (IROS), 2015.
[32] L. Montesano and M. Lopes, “Active learning of visual descriptors for
grasping using non-parametric smoothed beta distributions,” Robotics
and Autonomous Systems, vol. 60, no. 3, pp. 452–462, 2012.
[33] J. Oberlin and S. Tellex, “Autonomously acquiring instance-based
object models from experience,” 2013.
[34] S. Pandey, D. Chakrabarti, and D. Agarwal, “Multi-armed bandit prob-
lems with dependent arms,” in Proceedings of the 24th international
conference on Machine learning. ACM, 2007, pp. 721–728.
[35] F. T. Pokorny, K. Hang, and D. Kragic, “Grasp moduli spaces.” in
Robotics: Science and Systems, 2013.
[36] F. T. Pokorny and D. Kragic, “Classical grasp quality evaluation: New
theory and algorithms,” in IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), 2013.
[37] M. Salganicoff, L. H. Ungar, and R. Bajcsy, “Active learning for
vision-based robot grasping,” Machine Learning, vol. 23, no. 2-3, pp.
251–278, 1996.
[38] S. Salti, F. Tombari, and L. Di Stefano, “Shot: Unique signatures of
histograms for surface and texture description,” Computer Vision and
Image Understanding, vol. 125, pp. 251–264, 2014.
[39] A. Singh, J. Sha, K. S. Narayan, T. Achim, and P. Abbeel, “Bigbird:
A large-scale 3d database of object instances,” in Proc. IEEE Int. Conf.
Robotics and Automation (ICRA), 2014.
[40] G. Smith, E. Lee, K. Goldberg, K. Bohringer, and J. Craig, “Computing
parallel-jaw grips,” in Proc. IEEE Int. Conf. Robotics and Automation
(ICRA), 1999.
[41] N. Srinivas, A. Krause, S. Kakade, and M. Seeger, “Gaussian process
optimization in the bandit setting: No regret and experimental design,”
in Proc. International Conference on Machine Learning (ICML), 2010.
[42] T. Stouraitis, U. Hillenbrand, and M. A. Roa, “Functional power
grasps transferred through warping and replanning,” in Robotics and
Automation (ICRA), 2015 IEEE International Conference on. IEEE,
2015, pp. 4933–4940.
[43] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-view
convolutional neural networks for 3d shape recognition,” arXiv preprint
arXiv:1505.00880, 2015.
[44] J. Weisz and P. K. Allen, “Pose error robust grasping from contact
wrench space metrics,” in Robotics and Automation (ICRA), 2012 IEEE
International Conference on. IEEE, 2012, pp. 557–562.
[45] W. Wohlkinger, A. Aldoma, R. B. Rusu, and M. Vincze, “3dnet: Large-
scale object class recognition from cad models,” in Proc. IEEE Int.
Conf. Robotics and Automation (ICRA), 2012.
[46] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao,
“3d shapenets: A deep representation for volumetric shape modeling,”
in CVPR, vol. 1, no. 2, 2015, p. 3.
[47] L. E. Zhang, M. Ciocarlie, and K. Hsiao, “Grasp evaluation with
graspable feature matching,” in RSS Workshop on Mobile Manipulation:
Learning to Manipulate, 2011.
[48] Y. Zheng and W.-H. Qian, “Coping with the grasping uncertainties
in force-closure analysis,” Int. J. Robotics Research (IJRR), vol. 24,
no. 4, pp. 311–327, 2005.