Deep understanding of reinforcement learning - Markov decision process: dynamic programming method

Category Catalog:General Catalog of "In-depth Understanding of Reinforcement Learning"


Dynamic Programming (DP) is suitable for solving problems that satisfy the two properties of optimal substructure (Optimal Substructure) and overlapping subproblem (Overlapping Subproblem). Optimal substructure means that the problem can be split into small problems. By solving these small problems, we can combine the answers to the small problems to get the answer to the original problem, that is, the optimal solution. Overlapping subproblems means that the subproblem appears multiple times and the solution to the subproblem can be reused. We can save the first calculation result of the subproblem and use it directly when needed again.

The Markov decision process meets the requirements of dynamic programming. In the Bellman equation, we can decompose it into a recursive structure. When we decompose it into a recursive structure, if the sub-state of the sub-problem can get a value, then its future state can also be calculated because it is directly related to the sub-state. The value function can store and reuse the best solution to the subproblem. Dynamic programming is applied to the planning problem of the Markov decision process rather than the learning problem. We must know the environment completely before we can do dynamic programming, that is, we must know the state transition probability and the corresponding reward. Using dynamic programming to solve prediction and control problems is a very effective way to solve prediction and control problems in the Markov decision process. For the application of dynamic programming methods in reinforcement learning, please refer to the follow-up article "In-depth Understanding of Reinforcement Learning - Dynamic Programming Algorithm"

References:
[1] Zhang Weinan, Shen Jian, Yu Yong. Hands-on reinforcement learning [M]. People's Posts and Telecommunications Press, 2022.
[2] Richard S. Sutton, Andrew G. Barto. Reinforcement Learning (2nd Edition) [M]. Electronic Industry Press, 2019
[3] Maxim Lapan. Deep Reinforcement Learning Practice (2nd edition of the original book) [M]. Beijing Huazhang Graphic Information Co., Ltd., 2021
[4] Wang Qi, Yang Yiyuan, Jiang Ji. Easy RL: Reinforcement Learning Tutorial[M] . People's Posts and Telecommunications Press, 2022

Guess you like

Origin blog.csdn.net/hy592070616/article/details/134792935