Robot Techology

Two-Stage Grasping: A New Bin Picking Framework for Small Objects

Abstract: This paper proposes a new bin grabbing framework, two-stage grabbing, aimed at precise grabbing of small, cluttered objects.

  • Object density estimation and coarse grasping are performed in the first stage.
  • In the second stage, fine segmentation, detection, grabbing and pushing are required.

A small object grabbing system is implemented, demonstrating the concept of two-stage grabbing. Experiments demonstrate the effectiveness of the framework. Different from traditional vision-based grasp planning methods based on classical frameworks, the new framework proposed in this paper with simple vision detection and planning can solve the challenge of picking cluttered small objects.

Box picking is a typical problem in robotics.

Its goal is to pick identical objects one by one from a cluttered bin [1].

In recent years, it has attracted much attention due to its wide application in modern logistics [2,3] and service industry [4,5]. Despite its importance and ubiquity, litter boxes can still be challenging.

Recent box picking solutions use learning-based image processing techniques, such as image segmentation and pose estimation, to detect grasps that can pick an item from a box.

  • For example, [2] uses a semantic segmentation system and a heuristic scratch generation algorithm to select specific items from cluttered objects.
  • Grasping performance is related to semantic segmentation quality, while suction grasping quality is affected by point cloud resolution and accuracy. [6,7] introduced reflective object bin picking methods and datasets. These methods are based on scene reconstruction and 6D object pose estimation. [8,9] introduced vision-based object recognition and pose regression solutions in box selection scenarios . These methods have achieved remarkable results in the selection of industrial geotechnical soils of normal size

Problems of small object capture

  1. First, sensing cluttered small objects is challenging. Current state-of-the-art image segmentation methods still have difficulty distinguishing image pixels belonging to different small instances
  2. Object pose estimation methods require precise perception of complete point clouds or depth
  3. Small object size and complex object shapes make sensor resolution and occlusion constraints impact on pose estimation

Slovtion
To this end, this paper proposes a new bin grabbing framework, namely two-level grabbing. As the name suggests, there are two distinct stages of grasping from coarse to fine, perceptual-performed twice.

  • During the "rough" stage, one or more items will be grabbed from the bin and placed on a tray.
  • In the "delicate" stage, an item will be picked from the tray.

Two-stage grasping utilizes soft robotic grippers [18-21] and robotic manipulation [22-27], and combines with different soft robotic grippers [18-21], existing segmentation methods [2,11, 12], pose estimation methods [8,9,13] and grasp detection methods [16], [28-30] are compatible.

Furthermore, by dividing bin picking into two stages, vision detection and object grasping are also divided into a coarse stage and a fine stage. Regardless of the stage, vision inspection or object grasping is more reliable than traditional single-stage workflows. The proposed two-stage bin picking framework provides a promising solution for reliable, robust, accurate, and compatible small object bin picking.

Three Steps in rough and fine stage:

  1. sensing
  2. planning
  3. execution

rough stage : the gripper grabs one or more items from a box roughly at the location predicted by the small object density estimate
fine stage: the grab detector looks for feasible grabs on the handling pallet through dedicated image segmentation and collision checking .

This paper presents two simulation strategies: transcendence and interruption

Density Estimation of Cluttered Small Objects and Rough Grasping

Given an RGB image, this module estimates the density of identical small objects cluttered in the image. We use a fully convolutional network (FCN) to predict the density map of an object at the pixel level

In our implementation, we use the U-Net backbone to predict the density value of each pixel

insert image description here
Additional influencing factors:

i) ambient light

ii) The location of the camera

iii) Pose of the object

insert image description here

insert image description here


Grasp Detection

  1. The grasp detector first performs contour finding and PCA on the segmented image of the scene
  2. Secondly, contacts can be generated based on the orientation and
    contour of each item
  3. Thirdly, after generating contacts and grasps, collision checking is conducted by placing virtual fingerprints at every contact point and performing pixel-wise clearance checking
  4. Only grasps corresponding to isolated objects can survive. Finally, all collision-free and motion-feasible grasps and categories are passed to the task planner

Small Object Singulation by Soft Manipulation

insert image description here

insert image description here

Conclusion

  1. Soft manipulation algorithm to isolating clutter objects
  2. small object density estimation method
  3. Two small object singulation policies

Accurate instance segmentation of surgical instruments in robotic surgery: model refinement and cross-dataset evaluation

Purpose: Automatic segmentation of surgical instruments in robot-assisted minimally invasive surgery plays a fundamental role in improving context awareness. In this paper, we propose a modified Mask R-CNN based instance segmentation model that can accurately segment surgical instruments and identify their types.

Methods: We reformulate the instrument segmentation task as an instance segmentation task . We then optimize Mask R-CNN for instrument segmentation using anchor optimization and an improved region proposal network. Furthermore, we use different sampling strategies for cross-dataset evaluation.

Background:
Robot-assisted minimally invasive surgery is a new type of surgical method, which combines robotics and minimally invasive surgery, which can reduce surgical wounds, reduce bleeding, shorten hospital stays, and relieve patients' pain.
Research in this field mainly includes the development and optimization of robotic surgery systems, the standardization and automation of surgical operations, and image processing and analysis during surgery. In recent years, some new technologies such as deep learning and artificial intelligence have also been applied in this field and achieved certain results. In the future, Robot-assisted minimally invasive surgery is expected to become the mainstream method of surgery.

Challenges:

In challenging environments like RMIS (Robot-Assisted Minimally Invasive Surgery), such as narrow field of view and high workload of the surgeon [4], the automatic context awareness of the augmented system plays an important role in improving the performance of the surgeon and the safety of the patient. important role.

  • Automatic segmentation of different surgical instruments is an essential element for realizing intelligent context awareness of RMIS [10] and a prerequisite for many subsequent problems such as tracking, pose estimation, action recognition [11].
    However, automatic surgical instrument segmentation is challenging due to complex surgical situations, including motion blur, reflection, tissue occlusion, and gas, etc. , as shown in Figure 1. In addition, the limited endoscopic field of view often shows only parts of the instrument rather than the entire instrument, which further increases the difficulty of distinguishing them.

Model refinement based on mask R-CNN

Mask R-CNN has been proposed as a simple, flexible and general framework for object instance segmentation [9].

  • It introduces an efficient RPN to generate Regions of Interest (RoIs) based on proposed candidate object bounding boxes.
  • RoIs are used in conjunction with multi-scale defined anchors to predict bounding box proposals. We optimized Mask R-CNN for surgical instrument segmentation using an improved RPN (I-RPN) and anchor configuration.

Knowledge Supplyment:

  • RPN is the abbreviation of Region Proposal Network, which is a neural network for target detection and is used to generate candidate regions (region proposals) in images, that is, some regions that may contain targets. RPN is usually used as part of the Faster R-CNN network to generate candidate regions and perform object detection. In Mask R-CNN, RPN is also used to generate regions of interest (RoIs) of candidate objects for object instance segmentation.
  • FPN is the abbreviation of Feature Pyramid Network. It is a neural network for image semantic segmentation. It can extract feature information of images from different scales, so as to better deal with scale changes and multi-scale objects. FPN obtains the feature pyramid by gradually upsampling from the bottom layer to the top layer, and then fuses the features of different levels through horizontal connections to obtain richer and more accurate feature expressions. FPN is widely used in the fields of target detection and image segmentation. While improving the performance of the model, it can also reduce the amount of calculation and the number of parameters of the model.

insert image description here

Robot & AR

  • Recent advances in artificial intelligence have greatly improved many tasks, such as situational awareness of surgical procedures , and automation of surgical robots.

  • At the same time, AR aims to augment the surgical environment to facilitate surgeons' operations and decision-making, based on visualization and integrated additional information, computed offline or in real time .

  • AR equipped with an immersive view in the surgical robot console is effective for the education of novice surgeons and is expected to be very helpful if it can be adopted intraoperatively.

Unfortunately, so far, the advantages of artificial intelligence and AR have not been merged in a reasonable way for robotic surgery. The interesting combination of artificial intelligence and augmented reality emerges as a general topic and has been exemplified in many applications

Background:

Examples include games [17], driver training [18], [19], and virtual patients [20], [21]. Empowering intelligence in augmented reality not only enhances the virtual experience, but also leverages the powerful capabilities of learning-based algorithms in demanding tasks such as surgical education. However, few attempts have been made to apply artificial intelligence and augmented reality to surgical robotics. Some authors [22]–[24] propose to use computer vision models to localize anatomical regions of interest and then overlay the results in the camera view. However, these solutions only consider existing cues in perception and do not shed light on human-like decision-making behavior.
Meanwhile, reinforcement learning (RL) is widely recognized as an effective skill learning method [25]–[28], but its potential has not been fully exploited in the field of surgical robotics.
An interesting scenario for exploring these issues is surgical education, where intelligent RL-based agents should reason about surgical tasks and generate constructive guidelines for novices. This embodied intelligence promises to significantly increase the accessibility and reduce costs of surgical training. Implementing intelligent guidance on surgical robotic platforms, in the form of augmented reality visualizations, could further improve usability and user experience, but how this can be achieved remains unclear.

The dVRK platform refers to "da Vinci Research Kit" , which is a robotic surgery platform developed by The UBC Robotics and Control Laboratory (The UBC Robotics and Control Laboratory), Canada, for the research and development of robotic surgery technology. The platform is based on Intuitive Surgical's da Vinci surgical robot system, and has been improved and optimized to perform surgical operations similar to the da Vinci surgical robot, and provides some new functions and interfaces so that researchers can better explore Potential of Robotic Surgical Technology. The dVRK platform is an open source, customizable platform that can be controlled and programmed through ROS (Robot Operating System).

This paper aims to seamlessly combine artificial intelligence and augmented reality , enhance RL-based instrument motion trajectories with real-time AR visualization, and use da Vinci Research Kit (dVRK) to realize the flow of surgical education scenarios, as shown in Figure 1.

For AI-enabled analytics, reinforcement learning is employed to learn policies from expert demonstrations and interactions with the environment.

The system can then reason about future actions based on current observations. This behavior can be embedded in the educational process, where automatically predicted actions can serve as useful information to guide the learner's movements step by step.

Subsequently, how to effectively visualize information becomes very important, especially when combined with robotic platforms in real time. We will show how it is possible to overlay 3D guide trajectories in stereoscopic video via the dVRK console.

By projecting the 3D positions of trajectories generated from the RL policy into stereoscopic video frames in the dVRK console, we can vividly observe surgical scenes with superimposed trajectories for educational purposes. We have implemented and evaluated our method in a typical surgical education task, pin delivery.

  • To the best of our knowledge, this is the first time the synergy of AI and AR has been explored, utilizing RL-based prediction and dVRK-based visualization for surgical education scenarios.

Research on the application of artificial intelligence in robotic surgery has been actively conducted for a decade since the clinical introduction of the da Vinci surgical system in 2000 [29].
Several important research topics have been identified, such as surgical instrument segmentation [31], gesture recognition [6], workflow recognition [1] and surgical scene reconstruction [32]. They can support intraoperative decision-making [33] and provide valuable databases for surgical training and evaluation [34]–[36].
Although promising, these works only provide supplementary information without conceiving surgical plans, such as predicting the trajectory of surgical instruments. More recently, the advent of reinforcement learning has opened up a family of policy-based learning strategies [37].
The effectiveness of reinforcement learning has been revealed in surgical gesture classification [38], surgical scene understanding [40], robot learning [27], [42], etc.
By learning from expert demonstrations, reinforcement learning agents can automatically generate meaningful solutions to the task at hand. For example, in [27] and [28], the authors proposed to use deep deterministic policy gradient (DDPG) with behavioral cloning (BC) for the surgical tasks of two-handed needle regrasping and autonomous blood sucking. Both have demonstrated encouraging results, suggesting that reinforcement learning based frameworks could potentially alleviate the requirement for expert guidance [43]–[45].

Apart from this, there are also some works using artificial intelligence for surgical education, such as providing metrics and performance feedback based on training records [46]-[48], and taking into account the style characteristics of [49]-[51] to distinguish professional levels . But little consideration has been given to its use in AR surgical education, where integrating AR and artificial intelligence requires vivid, automated instruction of trainees.

  • Given the superiority of RL presented above, it is very tempting to integrate RL into AR as an important AI module, e.g., a decision maker [52], [53], which encourages AR systems to objectively generate content

Augmented reality has been applied in various paradigms of robotic surgery [12]. In intraoperative applications, augmented reality is superimposed in real time to help:
(i) enhance depth perception [54]–[56]
(ii) compensate for tactile perception [57]
(iii) expand the field of view [58]
(iv) provide more intuitive HMI [54], and (v) annotated useful hints [59], [60].

Other applications leverage augmented reality for robotic surgery training [61]. Commonly used display media for augmented reality in robotic surgery include da Vinci consoles, computer monitors, and head-mounted displays [12]. Head-mounted display-based augmented reality is an advantageous medium for surgical educational purposes because it can provide 3D display and interaction for multiple users and environments. applied augmented reality in a clinical-like training scenario by augmenting and superimposing a 3D translucent tool controlled by an instructor on the trainee console as a guide [62]. A user study involving seven mentor-mentee pairings was conducted, and they demonstrated augmentation tools as effective mentoring methods.

  • However, most current augmented reality systems do not yet incorporate artificial intelligence as a core component for generating and creating context-aware information.

framework

  1. AI+AR computing server , as shown in Figure 2a): one equipped with high-end GPU for AR and AI algorithm deployment, which can be obtained from the patient-side manipulator (PSM) and endoscopic camera manipulator (ECM) on the dVRK Stereoscopic video and kinematic information, receive input from other user interfaces, train and deploy RL policies or other AI frameworks, and output enhanced video streams to surveillance or other display devices.
  2. Endoscopic video acquisition and kinematic control : as shown in Figure 2 b) and d). dVRK incorporates a stereoscopic endoscope on top of the ECM, which captures stereoscopic video streams for 3D perception and awareness. In this case, we capture the video signal from the dVRK stereo endoscope with a video capture card and convert the video signal into a USB video stream that can be retrieved by the computer. By using the application programming interface (API) provided by the dVRK Robot Operating System (ROS) package, users can obtain kinematic information (including tool tip position, velocity, rotation) from the two PSMs on the dVRK and input the kinematic information to Controls the movement of the PSM. In addition, psm can also be directly controlled by the user using hands in the dVRK console.
  3. The AR video is displayed using the dVRK console : as shown in Figure 2c) and e). Equipped with a stereo viewer in the dVRK console, users can view the scene through the AR and AI information overlaid on the computer during the surgical scene. Stereoscopic viewers can provide left and right views, and humans can perceive surgical videos and superimposed information with a 3D sensation. This feature was later developed into TilePro™ [67] in the next generation of surgical robots, demonstrating the potential for real robotic surgery.

insert image description here

Generating AI Guidance with Reinforcement Learning

Rather than having experienced experts guide novices, our work automatically generates guidance through advanced reinforcement learning.
Considering the interesting advantages of SurRoL over the demonstration and experimentation of dVRK, we try to use it as our core AI module to learn and generate guided trajectories using task-specific RL algorithms.

  • SurRoL incorporates reinforcement learning to teach an
    intelligent agent to take interactive actions so that the cumulative reward can be maximized.

SurRoL refers to "Surrogate-assisted Robot Learning" , which is a robot learning framework based on agent model, which aims to solve some challenges in robot learning, such as data efficiency, generalization ability and security. The framework was developed by the Robot Control and Perception Laboratory of the University of British Columbia in Canada. Its core idea is to use agent models to simulate robot behavior in real environments, thereby reducing dependence on real robots and improving learning efficiency and safety. The SurRoL framework combines techniques such as deep learning, reinforcement learning, and meta-learning, and can be applied to various robot learning tasks, such as robot manipulation, path planning, and control. The framework also provides several experimental platforms and datasets for researchers to use and evaluate. The goal of the SurRoL framework is to advance the state of the art in robot learning and facilitate the application of robots in the real world.

  • For robotic surgery, we define six degrees of freedom, including position movement in Cartesian space (dx, dy, dz), orientation in a top-down/vertical space (dyaw/dpitch), and jaw opening state (j≥0) or closed state (j<0).

6D pose : refers to the position and orientation of an object in three-dimensional space, also known as "six degrees of freedom posture". It includes three position degrees of freedom (position on the x, y, z axes) and three orientation degrees of freedom (rotation angle of the object around the x, y, z axes). In robotic surgery, knowing the 6D pose of tools can help robots position and manipulate tools, enabling precise surgery.

The goal of reinforcement learning is to find a policy π to generate actions at = π(st).
To facilitate the learning process, the reward rt is set to be goal-based, where the success function f(st, g, at) determines the reward rt by checking whether at reaches the ground truth state (e.g. 3D position and 6D pose).

This situation is similar to the trial-feedback process in surgical education.

Finally, the target policy π is learned by maximizing the experimental expectation Eπ[PTt=0γtrt], where γ∈[0, 1) denotes a discount factor to balance the agent's long-term and immediate attention. Specifically, we choose a sample-efficient learning algorithm called Hindsight Experience Replay (HER) [68] and combine it with Q-filtered behavioral cloning. In practice, we will utilize a small amount of demo data generated from the scripted policy as a demo for imitation learning. This approach has great potential and can be broadly applied to learn from large amounts of surgical data and then provide AI-based guidance for surgical education.


Real-time 3D Visualization by Augmented Reality

  1. Different Coordinate System
  2. Human-involved Flexible Calibration
  3. Augmented 3D Overlaying
  4. Integrating AI and AR for Trajectory Guidance

Guess you like

Origin blog.csdn.net/RandyHan/article/details/130668829