DeepMind proposes a vision-based reinforcement learning model. Eighteen weapons are no problem for robots.

Humans are good at imitating. We and other animals imitate by observing behavior, understanding its perceptual impact on environmental conditions, and finding out what actions our bodies can take to achieve similar results.

Imitation learning is a powerful tool for robot learning tasks. But in this kind of environment-aware tasks, it is very difficult to use reinforcement learning to specify a reward function.

The latest DeepMind paper mainly explores the possibility of mimicking the operation trajectory only from the third person vision, without relying on the action state. The team's inspiration comes from a robot manipulator imitating visually demonstrated complex behaviors.

The method proposed by DeepMind is mainly divided into two stages:

1. Propose a Manipulation-Independent Representations (MIR, Manipulation-Independent Representations), that is, whether it is a manipulator, a human hand or other equipment, to ensure that this representation can be used for the learning of subsequent tasks

2. Use reinforcement learning to learn action strategies

Operator-independent representation

Domain adaptability is the most critical problem in robot simulation reality, that is, to solve the difference between visual simulation and reality.

1. Randomly use various types of manipulators, and various simulation environments are used to simulate the real world

2. Add the observation after removing the operating arm

3. Temporally-Smooth Contrastive Networks (TSCN, Temporally-Smooth Contrastive Networks). Compared with TCN, a distribution coefficient p is added to the softmax cross-entropy objective function, which makes the learning process smoother, especially in the cross-domain situation .

Use reinforcement learning

MIR indicates that the demand for space is actionable, which can be used for reinforcement learning, expressed as a specific action.

One solution is to use goal-conditioned to train the strategy, the input is the current state o and the target state g. This article proposes an extended method, cross-domain goal-conditional policies, input the current state o and the cross-domain goal state o'to minimize the number of actions to reach the goal.

Data and experiment

The research team conducted experiments on 8 environments and scenarios (canonical simulation, invisible arm, random arm, random domain, Jaco Hand, real robot, cane, and human hand) to evaluate the performance of simulating unconstrained operation trajectories through unknown manipulators.

They also used some baseline methods, such as naive goal conditioned plicies (GCP) and temporal distance.

MIR has achieved the best performance in all test areas. Its performance in stacking success rate is significantly improved, and it mimics the simulated Jaco Hand and Invisible Arm well with a score of 100%.

This research demonstrates the importance of visual imitation representation in visual imitation and verifies the successful application of operation-independent representation in visual imitation.

The robots in the factory of the future will have more powerful learning capabilities and are not limited to a specific tool or a specific task.

【Editor's Choice】

  1. Data technology will release new value in smart cities
  2. Internet of Things Development-GPS Positioning Technology
  3. Department 13: Use cloud computing, 5G, big data and other technologies to develop smart manufacturing
  4. Five technological trends that will shape innovation in 2021
  5. How artificial intelligence governance can reduce risks while reaping rewards

Guess you like

Origin blog.csdn.net/weixin_42137700/article/details/115238488
Recommended