Reinforcement learning & Monte Carlo 1 | Action collection episode

Name the sequence of state, action, reward until the end of the state as Episode, the task of the agent is to find the strategy that maximizes the accumulated expected reward

Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_43236007/article/details/114377789