Combining iterative strategy evaluation and strategy optimization, the pseudo-code of the strategy iterative algorithm algorithm is obtained
as follows. Starting from the equivalent probability random strategy, the probability of selecting an action for each state is the same. Then iterative strategy evaluation is performed to obtain the corresponding value function and strategy improvement to obtain a better or equivalent strategy until convergence.
Of course, in the process of strategy evaluation, we can set the number of iterations instead of θ as our termination condition. This algorithm is called truncated strategy iteration.