RoboTAP: a robot operating system developed by Google DeepMind

RoboTAP, a robot operating system developed by Google DeepMind . The system enables robots to learn new visuo-motor tasks with just a few minutes of demonstration. You only need to show it how to do something, such as picking up an apple and placing it on Jell-O, a few times for it to learn the action.

working principle

The system is capable of solving a variety of visuomotor tasks through a visual servo controller. At the heart of RoboTAP is a universal controller that can align points in the scene. The system solves multitasking operations by intensively tracking what, where, and how operations are performed. RoboTAP is able to learn these behaviors in minutes with a small amount of demonstration. It uses cameras or other visual sensors to obtain environmental information and controls the actions of robots or other automated equipment based on this information.

The controller not only recognizes the target object, but also identifies specific points or features on the object and operates accordingly. This capability enables RoboTAP to perform a variety of complex visuo-motor tasks such as pick and place, insertion and stacking. This precise control also means that RoboTAP can work in changing environments, including those where the attitude and position of objects are constantly changing.

Main components
Universal Controller: This is the heart of the system and is responsible for performing all tasks.
Visual Servo Controller: Used to track and align specific points in the scene.
Dense tracking: The system uses dense tracking technology to solve multi-tasking operation problems.

Functions and applications
  • Fast learning: RoboTAP can learn new visuomotor tasks in just a few minutes of demonstration.
  • Multi-tasking operation: Ability to solve multiple tasks such as pick and place, insertion, stacking, etc.
  • Environmental adaptability: able to adapt to different environments and object postures.
  • Limitations: May not be suitable in tasks requiring extremely high precision or multimodal (visual + force) input.
Projects and demos: robotap.github.io
Paper: arxiv.org/abs/2308.15975

Video demonstration
RoboTAP uses the advanced point tracking algorithm TAPIR (Tracking Algorithm for Point Inference and Recognition) developed by DeepMind to solve template insertion and other various tasks.

This system does not require CAD models or prior experience with target objects. It is able to detect at each moment the points on the object most important to the action (marked in red), infer where those points should move (marked in cyan), and calculate an action to move them there (marked in orange arrow ).

The advantage of this method is that it can quickly learn and solve tasks from less than or equal to 6 demonstrations, which greatly reduces training time and complexity.

3b1302ef8d0444d69746da70cc7411f7.jpeg

Guess you like

Origin blog.csdn.net/specssss/article/details/132721380