For a longer episode, if you take the same action in the same state and get different rewards, there are two ways to deal with it.
The first type named Every-visit MC is to calculate the average of these several, and the second type is First-visit MC to keep only the first result
Reinforcement Learning & Monte Carlo 4 | Every-visit and First-visit MC
Guess you like
Origin blog.csdn.net/weixin_43236007/article/details/114437190
Recommended
Ranking