Paper notes grabbing by robotic arm: DexNet 2.0

Grasping Planning

Finding a gripper configuration that maximizes a success (or quality) metric.

Method fall into wo categories based on success criteria:

  1. analytic methods
  2. empirical (or data-driven) methods

Computer Vision Techniques in Robot Grasping

Analytic grasp planning methods register images of rigid objects to a known database of 3D models, which typically involves segmentation, classification, and geometric pose estimation from 3D point cloud data in order to index precomputed grasps.
deep learning

  1. stimate 3D object shape and pose directly from color and depth images
  2. detect graspable regions directly in images without explicitly representing object shape and pose

Problem Statement

planning a robust planar parallel-jaw grasp for a singulated rigid object resting on a table based on point clouds from a depth camera.
Input: a candidate grasp and a depth image
Output: an estimate of robustness or probability of success under uncertainty in sensing and control

A. Assumptions

a parallel-jaw gripper

rigid objects singulated on a planar worksurface
single-view (2.5D) point clouds taken with a depth camera

B.Definitions

States

\(x = (O,T_o,T_c,\gamma)\)
O: the geometry and mass properties of an object
\(T_o\): the 3d poses of the object
\(T_c\): the 3d poses of the camera
\(\gamma\): the coefficient of friction between the object and gripper.

Grasps

\(u = (p,\phi) \to R^3 × S^1\)
denote a parallel-jaw grasp in 3D space specified by a center \(p = (x, y, z) \in R^3\) \(\phi\) angle

Point Clouds

\(y = r_+^{H×W}\) 2.5D point cloud

Robust Analytic Grasp Metircs

S(u,x)\( \in {0,1}\) a binary-valued grasp success metric, such as force closure or physical lifting.

p(S,u, x, y) a joint distribution on grasp success, grasps, states, and point clouds modeling imprecision in sensing and control.
Q(u,v) = E[S|u,y]

C.Objective

Learn a robustness function \(Q_{\theta^*}\)(u,y)\(\in [0,1]\) over many possible grasps, objects, and images that classifies grasps according to the binary success metric:
θ ∗ = a r g m i n θ ∈ Θ E p ( S , u , x , y ) [ L ( S , Q θ ( u , y ) ) ] \theta^* = argmin_{\theta \in \Theta} E_{p(S,u,x,y)}[L(S,Q_{\theta}(u,y))] θ=argminθΘEp(S,u,x,y)[L(S,Qi( u ,y))]
L the cross-entropy loss function

\(\Theta\) the parameters of the Grasp Quality Convolutional Network
Learning Q rather than directly learning the policy allows us to enforce task-specific constraints without having to update the learned model.

Learning a grasp robustness function

A. Dataset Generation

Graphical Model:

p(S,u,x,y) the product of a state distribution p(x)
observaton model p(y|x)
a grasp candidate model p(u|x)
an analytic model of grasp success p(S|u,x)
Model the state distribution as:
p ( x ) = p ( γ ) p ( O ) p ( T o ∣ O ) p ( T c ) p(x) = p(γ)p(O)p(T_o |O)p(T_c ) p(x)=p(γ)p(O)p(TtheO)p(Tc)
Insert picture description here
Grasp candidate model p(u | x) is a uniform distribution over pairs of antipodal contact points on the object surface that form a grasp axis parallel to the table plane.
Observation model is y = αŷ + \(\epsilon\)
ŷ is a rendered depth image for a given object in a given pose
α is a Gamma random variable modeling depth-proportional noise
\(\epsilon\) is zero-mean Gaussian Process noise over pixel coordinates with bandwidth l and measurement noise σ modeling additive noise
Model grasp success
S ( u , x ) = { 1 E Q > σ a n d c o l l f r e s s ( u , x ) 0 o t h e r w i s e S(u,x) = \begin{cases} 1 & E_Q \gt \sigma and collfress(u,x) \\ 0 & otherwise \end{cases} S ( u ,x)={ 10EQ>σandcollfress(u,x)otherwise
\(E_Q\) the robust epsilon quality
collfree(u,x) the gripper does not collide with the object or table

Database

Parallel-Jaw Grasps

For each grasp, evaluate the expected epsilon quality \(E_Q\) under object pose, gripper pose, and friction coefficient uncertainty using Monte-Carlo sampling.

Rendered Point Clouds

B. Grasp Quality Convolutional Neural Network

Architecture: GQ-CNN

Input: the gripper depth from the camera z and a depth image centered on the grasp center pixel v = (i, j) and
aligned with the grasp axis orientation φ.
Normalize the input data by subtracting the mean and dividing by the standard deviation of the training data

Training Dataset:

associating grasps with a pixel v, orientation φ, and depth z relative to rendered depth images

compute these parameters by transforming grasps into the camera frame of reference using the camera pose \(T_c\)
nd projecting the 3D grasp position and orientation onto the imaging plane of the camera.

Optimization:

SGD
Initialize the weight of model by sampling from a zero mean gaussian with variance \(\frac{2}{n_i}\)
\(n_i\) is the number of inputs to the i-th network layer.
Augment the dataset: reflect vertical and horizontal axes, rotate, adaptively sample image noise

Grasp Planning

grasping policy \(\pi_{\theta}(y) = argman_{u \in C}Q_{\theta}(u,y)\)
C a discrete set of antipodal candidate graspssampled uniformly at random in image space for surface normals defined by the depth image gradients.
Each grasp candidate:

  1. kinematically reachable
  2. not in collision with the table is executed

Guess you like

Origin blog.csdn.net/eight_Jessen/article/details/107945357