Random Thoughts on Deep Reinforcement Learning

  • About model-based and model-free

    Model-free methods cannot be the future of reinforcement learnig, even though these algorithms perform better than model-based methods at the present time. The fatal flaw lies in the lack of interpretability. We cannot trust the policy without knowing why it takes a specific action, especially since it always takes some actions that are stupid and obviously wrong in our view. Model-based methods relieve our concerns to some extent, because we can get some knowledge about future states and outcomes. However, the model should be learned in most of the time and it cannot be accurate like the real environment. A way we can solve it must be planning methods especially tree search methods like Monte Carlo Tree Search (MCTS). Tree search methods can reduce the variance of the learned model using bootstrapping at each node, which is something like TD methods. It also presents us with better interpretability which is very critical.

猜你喜欢

转载自www.cnblogs.com/initial-h/p/12208038.html