【Learning】RL

sparse reward

We don't know if the action is good or bad, most of the time, if r = 0, how can that be fixed?

For example, a robot arm fixes a bolt to a screw, and developers should define additional rewards to guide the agent (reward shaping).

Reward shaping requires some domain knowledge.

Get additional rewards when the agent sees something new (but meaningful).

no reward:learning from demonstration

motivation

In some tasks, even defining rewards can be challenging. Handcrafted rewards lead to uncontrolled behavior.

Imitation learning can be used without reward

Actors can interact with the environment, but bonus functions are not available.

In some extreme cases (experts have not encountered), what should the machine do?

The agent replicates every behavior, even unrelated actions.

Inverse Reinforcement Learning

Inverse reinforcement learning will reverse the reward equation based on the expert

A simple reward function does not necessarily learn a simple actor

Assume that the teacher's reward is the best, but it does not mean completely imitating the teacher.

Principle: The teacher is always the best.

Basic idea: a participant actor is initialized, and in each iteration, the actor interacts with the environment to obtain some trajectories.

Define a reward function such that the teacher's trajectory is better than the participant's trajectory. Participants learn to maximize rewards according to a new reward function. Outputs the reward function and the actors learned from the reward function.

The actor is very similar to the generator in GAN, and the reward function is very similar to the discriminator.

Learn from the machine screen:

Guess you like

Origin blog.csdn.net/Raphael9900/article/details/128547118
RL
RL