Reinforcement Learning & Dynamic Programming 3 | Policy Iteration

Combining iterative strategy evaluation and strategy optimization, the pseudo-code of the strategy iterative algorithm algorithm is obtained
Insert picture description here
as follows. Starting from the equivalent probability random strategy, the probability of selecting an action for each state is the same. Then iterative strategy evaluation is performed to obtain the corresponding value function and strategy improvement to obtain a better or equivalent strategy until convergence.

Insert picture description here
Of course, in the process of strategy evaluation, we can set the number of iterations instead of θ as our termination condition. This algorithm is called truncated strategy iteration.

Guess you like

Origin blog.csdn.net/weixin_43236007/article/details/107857137