[ZJU-Machine Learning] Introduction to AlphaGo

AlphaGo principle

Three deep policy networks (Policy Networks) and one deep valuation network (Value Network)
Insert image description here

Supervised Learning Policy Network

Insert image description here
Insert image description here
Optimization analysis:
Insert image description here
chessboard characteristics:
Insert image description here

Stone Color
Insert image description here

Turn Since:
Insert image description here
Liberty:
Insert image description here
Beat Quantity:
Insert image description here
Ladder:
Insert image description here

Sensibility:
Insert image description here

Deep Policy Network Reinforcement Learning Policy Network

Insert image description here

Insert image description here
Insert image description here
Reinforcement Learning Training Strategy:
Insert image description here
Training Details and Results:
Insert image description here

Deep Valuation Network Rollout Policy Network

Insert image description here
Insert image description here
Insert image description here

How to play chess - Monte Carlo Tree Search

Simulate a future chess game multiple times and choose the move that was selected the most often in the simulation

Insert image description here
uIncrease choice diversity
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here

Improvements to AlphaGo Zero

(1) There is no need for human chess records at all, and you can learn by playing chess by yourself.

(2) Merging the chess moving network and the valuation network into one network:
Insert image description here
self-learning process and neural network training process

Insert image description here
Insert image description here
Insert image description here

Guess you like

Origin blog.csdn.net/qq_45654306/article/details/113508427