AlphaGo principle
Three deep policy networks (Policy Networks) and one deep valuation network (Value Network)
Supervised Learning Policy Network
Optimization analysis:
chessboard characteristics:
Stone Color
Turn Since:
Liberty:
Beat Quantity:
Ladder:
Sensibility:
Deep Policy Network Reinforcement Learning Policy Network
Reinforcement Learning Training Strategy:
Training Details and Results:
Deep Valuation Network Rollout Policy Network
How to play chess - Monte Carlo Tree Search
Simulate a future chess game multiple times and choose the move that was selected the most often in the simulation
uIncrease choice diversity
Improvements to AlphaGo Zero
(1) There is no need for human chess records at all, and you can learn by playing chess by yourself.
(2) Merging the chess moving network and the valuation network into one network:
self-learning process and neural network training process