Deep learning practice | Develop a Go agent

picture

01. Data model

The training process of the neural network needs to parse out the sample data in the HD5 file. The chessboard situation in the data set can be extracted and directly input to the convolutional network for feature extraction. Take the sample label from the attribute for the loss calculation and backpropagation of the neural network. As shown in Figure 1, after the player information is extracted from the attributes, it does not need to participate in the feature extraction of the chessboard situation, but is directly added to the subsequent logical judgment.

picture

■ Figure 1 Basic data flow structure framework

To use a neural network to learn Go, the first step is to represent the Go board with mathematical symbols. Figure 2 shows the digital notation of a 5×5 Go board, which usually converts the two-dimensional graphic board surface into a matrix form, where the number 1 represents black chess, -1 represents white chess, and the vacancy on the board is represented by 0 .

picture

■ Figure 2 Number notation of Go board surface

In addition to the situation with a chessboard, in order for the neural network to give advice on where to move, it is also necessary to tell the neural network whose turn it is to move. As shown in Figure 3, for the same situation, different players will have different choices.

picture

■ Figure 3 Different options for the same disk

As shown in Figure 5-8, if the entire strategy is completely composed of fully connected networks, in addition to flattening the two-dimensional chessboard into one-dimensional data, from the perspective of keeping the structure of the neural network as simple as possible, it can only be The logical judgment network adds a single input interface indicating which side makes the move, and uses 1 and -1 to indicate whether the current move should be black or white.

picture

 

■ Figure 4 Data preprocessing when using a fully connected neural network

Figure 4 demonstrates two options for processing the Go board with a convolutional network. One is to separate the process of extracting the two-dimensional structural features of the chessboard from the information of the current square, which is only used as a separate input node of the logical judgment network. Or considering that the chessboard situation is a two-dimensional data, when the convolutional network is used to collect the information of the graphic features, one more channel can be added to the input of the convolutional network, and this channel can be used to provide information about whether the current move should be black or white. . The advantage of this method is that it is simpler to implement than the previous method. The disadvantage is that it must be used in conjunction with the convolutional network. However, considering that the Go board can obtain better graphic features by using the convolutional network, it is suitable for processing Go games. This is not a shortcoming.

picture

■ Figure 4 Two processing methods of the convolutional neural network on the Go board

In this paper, the features of the chessboard situation are first extracted through the convolutional network, and then the board surface features are combined with the current mover to input the logic judgment network for the final move selection.

 

02. Obtain training samples

u-go.net is a self-built website for Go enthusiasts, anyone can download the Go player's battle records from the KGoServer (KGS) website for free. These battle records are all saved as SGF format files. The website provides the game records of players with 7 or more ranks or 4 or more ranks, and provides downloads in three formats: ".zip", ".tar.gz" and ".tar.bz2". Usually, in order to ensure the chess playing ability after machine learning, you can use more than 7 game records. If the samples are too small, consider using more than 4 game records.

For the convenience of processing these data on Windows, it is recommended to download the file in ".zip" format. If readers wish, they can manually click to download one by one, but for convenience and speed, a small Python program is provided in the SGF_Parser directory of MyGo, which can be used to easily obtain all links in the ".zip" format. The specific operation method is: right-click the browser, save the webpage file in the MyGo\SGF_Parser folder, and use the default file name "u-go.net.html" to save. Then run python fetchLinks.py>zip.link in the cmd window to execute the Python file shown in code snippet 1. Open the newly generated "zip.link" file, copy and paste all the contents to Xunlei to download, please save the file in "MyGo\SGF_Parser\sgf_data\". Select all downloaded ZIP files, right-click and select 7-zip to decompress, and select "Extract to current directory", so that there will be all SGF files to be parsed in the "MyGo\SGF_Parser\sgf_data\" directory.

 [Code Snippet 1] Crawl the web link of the training sample.

MyGo\SGF Parser fetchLinks.py
from bs4 import BeautifulSoup
f = open('u-go.net.html','r')
html = f.read()
soup = BeautifulSoup(html,"html.parser")
for link in soup.find all('a'):
if'zip'in link.get('href'):
print(link.get('href'))

 The Go board itself has no directionality. For example, it doesn't make a difference to the opponent where the player's first piece lands on the star at the beginning of the game. But for computers, the program does not have the adaptive ability of humans, especially when extracting feature values ​​through convolutional networks, the network is very sensitive to the position or direction of object features. In the artificial intelligence training of image recognition, there is a technology called data enhancement, which is to increase the sample set of neural network training by rotating or flipping the original sample, which enables the neural network to recognize inverted, symmetrical or Target objects from different angles. When training Go agents, similar techniques are used to improve the efficiency of training. Since the Go board is always a square, after obtaining a training sample, the sample can be rotated by 90°, 180°, and 270°. At the same time, the sample can be horizontally mirrored and flipped, and the previous rotation operation can be performed again. As shown in FIG. 5 , a disk surface becomes 8 samples after being processed by the above-mentioned technique.

picture

■ Figure 5 One plate is processed into 8 samples

Since the number of artificial chess records is relatively small compared to the number required for machine learning, this problem can be alleviated through the above-mentioned techniques, but to fundamentally solve it, the process of sample generation must be automated. The most convenient way to generate game records is to use existing Go intelligent programs to play against each other. In this way, a steady stream of Go records can be generated. But this method has a fatal shortcoming, that is, it is difficult for the trained intelligent program to break through the original intelligent program in terms of chess ability. This fatal shortcoming is also a common problem of traditional artificial intelligence with supervised learning as the core algorithm. In the following chapters, we will see other more effective methods to enhance the training results of Go agents. But at present, through this traditional method, the Go intelligent program has been able to outperform the random system in terms of chess strength.

03. Code demo

The traditional neural network updates its parameter information through supervised learning, essentially establishing a prediction function by fitting the data in the training set, and relying on this function to infer new results for new data. Among them, the training data is composed of input samples and expected output labels. The output of the function can be a continuous value or predict a category. Simply put, the game of Go can be abstracted as a classification problem in the field of artificial intelligence research. The 361 positions of the 19-way chessboard are exactly 361 kinds of classifications. This section will use the previous knowledge and use the neural network to implement an intelligent program, which can determine which of the 361 categories the current chess situation should be classified into according to the different situations of the chess surface, and give suggestions for moves.

Structurally, we can learn from the famous Inception to build a Go intelligent program network. Before the emergence of Inception, most popular convolutional neural networks just stacked more and more convolutional layers, making the network deeper and deeper, hoping to get better performance.

The main feature of the Inception structure is that it uses convolution kernels of different sizes to extract features from the same object and finally stitches and fuses features of different scales. In the initial stage, it is not necessary to use a network as deep as Inception. Figure 6 imitates and simplifies the convolutional network structure after Inception. It only uses a set of modules of Inception for chessboard recognition, and then uses a fully connected layer to Make a logical judgment. In order to smoothly transition from the convolutional network to the perceptual network, the output of the last layer of convolution can be deliberately made into a 1×1×c shape, and then use the flatten function to expand this layer into a perceptual network.

picture

■ Figure 6 imitates and simplifies the convolutional network architecture after Inception

In machine learning, samples often cannot obtain all complete samples at one time, it is always accumulated little by little. The same is true for Go training samples. When you get some game records, use them as training samples. After a new game is over, use the new game records as new samples. Each time a new sample set is obtained, an HDF5 file can be independently generated for it, without having to regenerate a separate HDF5 file in full each time. Randomly extracting HDF5 files from the file system every time during training will increase the overhead of disk I/O. Since the HDF5 file structure is very simple, theoretically, as long as the group name is not repeated, it is completely feasible to merge the newly added HDF5 file into the original HDF5 file. Technically, the official HDF5 suite provides a command-line tool called h5copy that can be used to merge HDF5 files, but before using it, you need to download the complete HDF5 application, and Windows users can download it directly from the official website. 

Code snippet 2 defines the location for storing learning records, learning sample files and network models.

[Code snippet 2] Initial definition.

filePath = "./game_recorders/game_recorders.h5"   #1
games = HDF5(filePath,mode = 'r')                 #2
type = 'pd_dense'                                 #3
model = DenseModel( dataGenerator = games. yeilds_data, boardSize = 9, dataSize = 1024 * 100,model = type)                                 #4

 Description /

(1) The data of learning samples are stored in HDF5 format files. Training samples can be obtained from historical chess games, or can be automatically generated through programs. How to automatically generate game records through programs will be introduced in the generalized Go agent program.

(2) Use games to obtain training samples and corresponding labels from the HDF5 files that store samples.

(3) The network model is predefined in DenseModel(). The pd_dense type contains an optional parameter to specify whether to use a fully connected network or a convolutional network. The default is a convolutional network. type can also be set directly to cnn, which explicitly indicates the use of a convolutional network.

(4) Call the predefined DenseModel() neural model. Use the data generator in games to generate a steady stream of training data. The data generator is a very useful technology, especially for the training process with a huge amount of sample data. Due to memory limitations, it is impossible to load all the data at one time. Through this continuation, the training process can be obtained one by one on demand. sample.

Code snippet 3 defines the methods of compiling, learning and saving the model.

[Code Snippet 3] Compile, learn and save the model.

MyGo sample loader.py
model.model compile()        #1
model.model fit(batch size = 16 * 2,epochs = 10000,earlystop = 10,checkpoint = True)  #2
model.model save(type + '.h5')  #3

Description /

(1) Use the predefined gradient optimization algorithm and error function of the DenseModel() method.

(2) Start training, here we use the function of early stopping and recording network parameters. Due to too many training rounds, the training effect is not clear, so the use of early stopping and parameter recording can avoid the waste of training time due to unreasonable network design.

(3) Save the model after training is complete.

It is very convenient to use Keras to do traditional neural network training, and the code writing method is basically fixed. The only thing to do is to adjust the parameter selection. You can use MyGo\test_fast_play.py to see the chess strength using this training method.

Code snippet 4 demonstrates how to load and use the learned agent.

[Code Snippet 4] Load the agent and start playing chess.

MyGo test fast play.py
from board fast import *   #1
board = Board(size = 9)
bot1= None #2
bot2 = Robot(ai='SD', boardSize = 9,model = pd dense')   #2
game = Game(board)
print(game.run(play b = bot1,play w = bot2,isprint = True)) #3

Description /

(1) Introduce all classes under the board_fast tool for subsequent calls.

(2) bot1 is set to manual input, and bot2 uses the model just trained. The Robot class loads the lj.h5 neural network weight file in the MyGo directory by default, so remember to manually adjust the file name of the training result file when using the training result.

(3) Run the chess game and print out the outcome.

Guess you like

Origin blog.csdn.net/qq_41640218/article/details/132034310