Optimistic Curiosity Exploration and Conservative Exploitation with Linear Reward Shaping论文总结

--------------------Paper:0
1.Title: Optimistic Curiosity Exploration and Conservative Exploitation with Linear Reward Shaping
2.Authors: Yao Zhang, Tiancheng Lou, Hao Wu, Dong Yan, Cheng Wu, Shihao Zhang, Yiming Zhang
3.Affiliation: 1University of Cambridge; 2 Tencent RoboticsX; 3Hong Kong University of Science and Technology; 4 Tsinghua University; 5 IDEA; 6 University of California, Los Angeles
4.Keywords: Reward Shifting, Exploration, Exploitation, Deep Reinforcement Learning (DRL)
5.Url: http://arxiv.org/abs/2209.07288v2

6.Summary:
(1) This paper studies the simplest form of reward transformation in value-based deep reinforcement learning, the linear reward transformation method. The purpose of this paper is to explore the impact of this approach on facilitating exploration and exploitation.
(2) Previously proposed methods aim to address the balance of exploration and exploitation, such as counting-based, curiosity-based exploration, and so on. These methods have their limitations, while our method achieves a balance between exploration and exploitation through a simple transformation of the reward function. Our method differs from previous methods because the reward transformation does not change the optimal policy, which means our method can encourage exploration during training without causing learning bias.
(3) This paper applies the linear reward transformation method to three types of deep reinforcement learning tasks: (S1) offline reinforcement learning, (S2) online continuous control, (S3) single-step offline curiosity exploration. The paper also tests the results on different deep reinforcement learning tasks.
(4) This paper demonstrates the performance of our method on continuous control and discrete control tasks, and calls it "conservative exploitation" and "spirit of inquiry", respectively. In particular, our method achieves better learning results than usual methods.

7.Methods:
(1) First of all, the article mentioned the exploration-exploitation dilemma faced by balanced exploration and utilization, as well as various exploration ideas in previous studies, such as counting methods, curiosity-driven methods, etc., but these methods have their own limitation. Therefore, this paper proposes a simple method based on linear reward transformation to achieve a balance between exploration and exploitation, exploring more states and actions.

(2) The method in this paper encourages the robot to explore more potential states and actions during the training process without changing the optimal strategy through linear reward transformation, thereby avoiding bias in learning. It can be applied to three types of deep reinforcement learning tasks: offline reinforcement learning, online control and single-step offline curiosity exploration.

(3) In the method of this paper, the researchers used offline reinforcement learning, online continuous control and single-step offline curiosity exploration tasks, tested the results of different deep reinforcement learning tasks, and tested the effect of this method through experiments.

(4) The specific methods of implementation include: using the method of changing the reward function, which is called "conservative utilization" in continuous control tasks, and "inquiry spirit" in discrete control tasks. It is proved in experiments that the inquiry method in this paper is better than traditional methods. More effective.

Guess you like

Origin blog.csdn.net/hehedadaq/article/details/129386815