Cressy from Aofei Temple
Qubit | Public account QbitAI
Notice how the robot easily cuts a piece of wire with its pliers.
The iron box with the lid opened after three clicks, five divided by two.
In addition, tasks such as object grabbing can be completed easily.
Behind this robot is the latest embodied intelligence achievement launched by New York University and the Meta AI Laboratory.
Researchers have proposed a new training method called TAVI, which combines vision and touch to more than double the effectiveness of robots in performing tasks.
At present, the research team's paper has been published publicly and the relevant code has been open sourced.
Seeing the performance of this robot, Meta chief scientist LeCun couldn't help but lament that this is an amazing progress.
So what else can a robot trained in this way do?
Picking things up and putting them away is a piece of cake
It can separate two bowls that are stacked together and take the top one.
If you observe carefully, you can see that during the separation process, the robot's hand made a chasing motion, allowing the yellow bowl to slide along the inner wall of the green bowl.
This robot can not only "divide" but also "combine".
After picking up the red object, the robot accurately placed it into the purple lid.
Or, turn the rubber over.
I saw it pick up a large piece of rubber, and then use the box below to adjust the angle.
Although I don’t know why I didn’t use more fingers, I learned to use tools after all.
In short, the movements of embodied intelligent robots trained using the TAVI method are somewhat similar to humans.
According to the data, the TAVI method is significantly better than the method using only tactile or visual feedback in 6 typical tasks.
Compared with the AVI method that does not use tactile information, the average success rate of TAVI is increased by 135%, and compared with the image + tactile reward model method, it is also doubled.
However, the success rate of T-DEX training method, which also uses a mixed visual and tactile model, is less than a quarter of that of TAVI.
The robot trained by TAVI also has strong generalization ability - the robot can also complete tasks for objects it has never seen before.
In the two tasks of "taking the bowl" and "packing the box", the robot's success rate when facing unknown objects exceeded half of the time .
In addition, the robot trained by the TAVI method can not only complete various tasks well, but also perform multiple sub-tasks in sequence.
In terms of robustness , the research team conducted tests by adjusting the camera angle, and the robot still maintained a high success rate.
So, how does the TAVI method achieve such an effect?
Evaluating robot performance using visual information
The core of TAVI is to use visual feedback to train the robot. The work is mainly divided into three steps.
The first is to collect demonstration information given by humans from two dimensions: visual and tactile.
The visual information collected is used to build a reward function for use in the subsequent learning process.
In this process, the system uses comparative learning to obtain visual features useful for completing the task, and evaluates the completion of the robot's actions.
Then, tactile information and visual feedback are combined to train through reinforcement learning, allowing the robot to try again and again until it obtains a higher completion score.
The learning of TAVI is a step-by-step process. As the learning steps increase, the reward function becomes more and more perfect, and the robot's movements become more and more precise.
In order to improve the flexibility of TAVI, the research team also introduced a residual strategy.
When encountering differences from the basic strategy, you only need to learn the different parts without having to start from scratch.
The ablation experiment results show that if there is no residual strategy and the robot learns from scratch every time, the success rate of the robot completing the task will be reduced.
If you are interested in embodied intelligence, you can read the research team’s paper for more details.
Paper address:
https://arxiv.org/abs/2309.12300
GitHub project page:
https://github.com/irmakguzey/see-to-touch