Reinforcement Learning & Monte Carlo 4 | Every-visit and First-visit MC

For a longer episode, if you take the same action in the same state and get different rewards, there are two ways to deal with it.
The first type named Every-visit MC is to calculate the average of these several, and the second type is First-visit MC to keep only the first result

Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_43236007/article/details/114437190