Playing Go using Deep Reinforcement Learning without Hu

Author: Zen and the Art of Computer Programming

1 Introduction

​ Go (Go) is an ancient and classic tabletop backgammon game, which is also well-known in China. Go is different from other cross-strait and three-region board games. It does not emphasize a sense of control in one step. As long as both players in the game abide by the rules, they can win through the "game". In Go, two chess pieces (white and black) can be placed in each position, and four chess pieces can be placed in one position. In one move, either party needs to make a choice, which piece the player chooses, where it will be placed, and also needs to maintain the balance of the game. Go's immense popularity is what sets it apart from more recent less popular games like Chinese Chess.

​ As the most popular strategic game in the world at that time, Go also played an important role in the AI ​​world. The reinforcement learning method represented by Deep Reinforcement Learning (DRL) has been successfully applied to the field of Go. By modeling gestures and environmental states, the Go model is trained using machine learning technology, so that the computer can learn more in the process of self-learning. , to identify the appropriate position of the opponent and the way of placing the ball, so as to maximize the winning rate in the game. In recent years, there have been a variety of Go models, each with its own advantages, but there are many difficulties in how to train a good DRL model.

​ This article attempts to systematically discuss the application and development of DRL in Go, and expounds the mechanism, limitations, advantages and disadvantages, scope of application and future development direction of DRL in Go. Hope to provide some valuable meaning.

About the author: , a former graduate student from Tsinghua University, a former assistant professor at National University of Defense Technology, now a senior algorithm engineer at Baidu, specializing in intelligent search, recommendation system, image understanding, natural language processing, bioinformatics, machine learning and driverless .

2. Explanation of basic concepts and terms

2.1 Policy Network

​ Policy Network, which is the output layer of the DRL model. The network accepts historical states (S

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132364063