Name the sequence of state, action, reward until the end of the state as Episode, the task of the agent is to find the strategy that maximizes the accumulated expected reward
Reinforcement learning & Monte Carlo 1 | Action collection episode
Guess you like
Origin blog.csdn.net/weixin_43236007/article/details/114377789
Recommended
Ranking