DQN: discrete low latitude action space
DPPG: deterministic strategy depth gradient algorithm can be used to solve depth study on the continuous strengthening of the action space issues
Q-learing: discrete, low latitude of action space
1, the basic reinforcement learning algorithm
- Markov Decision Process
- Policy iteration
- Iteration value
- Generalization iteration
2, reinforcement learning method based on the value of the function
- Reinforcement learning method based on Monte Carlo method
- Based on the time difference reinforcement learning method
- Reinforcement learning (DQN, Q-learing, Double Q_Learing) based on the value of the function
3, based on the direct search strategy of reinforcement learning method
- Based on reinforcement learning strategies gradient (Actor-Ctritic, A3C,)
- Based on reinforcement learning confidence Domain Policy (TRPO)
- Reinforcement learning method based on deterministic strategy
- Based on reinforcement learning strategies to guide search (ADMM)
4, strengthen research and learning Foreword
- Reverse reinforcement learning
- Combination strategy gradient method and the value function
- Value function network
- Based on reinforcement learning model: PILCO and its extension