Second understanding algorithm | Alpha-Beta pruning algorithm in Go

 

01. Alpha-Beta pruning algorithm

The minimax algorithm will traverse all possibilities, but we can know from experience that not all options need to be considered in depth, and there are some obviously unfavorable options, when such options appear, you can change them. A way of thinking was considered. The appearance of the Alpha-Beta pruning algorithm is precisely to reduce the number of nodes in the search tree of the minimax algorithm. On May 11, 1997, IBM's "Deep Blue" that defeated Garry Kasparov adopted this algorithm.

Taking tic-tac-toe as an example, let’s first see if there is room for optimization in the process of playing chess. With reference to Fig. 1, it is the current turn to draw the ○ side, if the chess is not placed on the dotted circle, the next step is to draw the × side to be drawn at the dotted circle, and the game is over. When this kind of problem is found, it is meaningless to think about the income of the other five positions marked with △, and it is a waste of computing resources.

 

■ Diagram 1 The round of the party with ○

Let's look at another chess example. As shown in Figure 2, it is the turn of the "handsome" side to move. Putting the cannon across the middle is a very lethal move, and you may be able to cooperate with your horse to get out of the ultimate move of "cannon after the horse". But if you take this step, your horse will be eaten by the opponent's rook immediately. This loss is too great, so in the face of this situation, in actual combat, you will basically only consider how to walk the horse to avoid being eaten by the rook. Other moves are not considered further.

■ Figure 2 The choice of the red side

In the process of playing chess, when it is found that one's own side will experience great losses or great profits, the basic idea of ​​the pruning algorithm is to only consider these situations with significant gains and ignore other options, and the Alpha-Beta pruning algorithm The branch algorithm is a search algorithm specially designed to reduce the number of nodes in the search tree of the minimax algorithm. Its basic idea is to decide whether to continue the current search based on the current optimal result obtained by the previous layer. When the algorithm evaluates that the follow-up move of a certain strategy is worse than the previous strategy, it will stop calculating the strategy. subsequent development. The Alpha-Beta pruning algorithm uses the search time on "more promising" sub-branches, and then increases the search depth, and the search depth can be more than twice that of the minimization-maximum algorithm on average in the same time period.

According to the algorithm introduction, if you want to use the Alpha-Beta pruning algorithm, you will need an additional situation value evaluation system to determine which search branches are promising and which are hopeless. The so-called situation value refers to the probability of winning or losing in the current market. The higher the winning rate, the greater the value, and vice versa, the smaller the value, or even negative value. The strength gap between various artificial intelligence programs using the Alpha-Beta pruning algorithm is actually caused by the difference in the situation value evaluation system. The position value evaluation system is highly subjective. How to evaluate the value of a chess game is a bit like Shakespeare said, "There are a thousand Hamlets in the eyes of a thousand spectators." The following will continue to use tic-tac-toe to demonstrate the Alpha-Beta pruning algorithm. In order to save the need to design the value function of tic-tac-toe, code fragment 1 roughly considers that except for winning and losing, the value of all other boards (including draws) is zero, the board value of winning chess is 1, and the board value of losing chess is -1. If the reader wants to try this algorithm out for himself on the game of Go, one of the simplest position evaluation algorithms is to calculate the difference between the pieces currently left on the board between the two sides. However, in actual combat, few chess players take the initiative to extract the opponent's dead end pieces, so maybe the high-value situation obtained by this evaluation method will bring more adverse effects.

[Code Snippet 1] Get the evaluation result of the game.

MyGo tic - tac - toe ttt.py
def evl game(game) :
if getResult(game.board.board)[1] != None: 
if game.player == getResult(game.board.board) == (0,1)[1]: 
return 1 
else:
return - 1                  
else:
return 0

 Description  /

(1) Judging the result of the board, according to the agreement, for Tic Tac Toe, the value of the board is judged only when the outcome of the game has been divided, otherwise the value of the board is zero.

(2) Judging whether the current player who is evaluating the value is the winner of the game.

(3) If the winner is the current player, the value of the board is 1; if the winner is the opponent of the current player, the value of the board is -1, and the value in other cases is 0 by default.

The introduction of the value evaluation of the board surface shows that when using the Alpha-Beta pruning algorithm, it is not necessary to be obsessed with exhausting the game when searching, that is, when simulating thinking behavior, it does not necessarily have to stop at the end of the game. Usually when using this algorithm, a search depth parameter is set to control the number of rounds of algorithm simulation thinking. In essence, the Alpha-Beta pruning algorithm uses the value evaluation function to control the search breadth of the algorithm, and uses parameter settings to control the search depth of the algorithm.

Compared with the minimax algorithm, the Alpha-Beta pruning algorithm does not give an evaluation of the situation until the end of the chess game. The evaluation results obtained by each different option will be given by the value evaluation function. Numerical results, different evaluation results (minimum-maximum algorithm only has winning, losing, and three evaluation results) lead to Alpha-Beta pruning algorithm needs to record the game players can obtain during the search process The best value, the best value recorded by both parties can be equivalently regarded as the winning result in the minimax algorithm. Traditionally, the best value of the current disk that one party can search is called Alpha, and the best value of the other party is called Beta, which is also the origin of the name of this algorithm. For tic-tac-toe, these are abbreviated as best_o and best_x. Code snippet 2 demonstrates how the Alpha-Beta pruning algorithm is implemented.

[Code Snippet 2] The code framework of the Alpha-Beta branch reduction algorithm.

MyGo tic - tac - toe ttt.py
if self.mode == 'ab':
move = self.game.getLegalMoves()
best_moves = []
best_socre = None
best_o = manValue
best_x = minValue
for move in moves:
new game = self.game.simuApplyMove(move)
op_best_outcome = alpha beta prune(new_game, max_depth,best_o, best_x,evl_game)
my best outcome = -1 * op_best outcome
if (not_best_moves) or my_best_outcome > best_score:
best_moves = [move]
best_score = my_best_outcome
if self.game.player == player_x:
best_x = best_score
elif self.game.player == player_o:
best_o = best_score
elif_my_best_outcome == best_score:
best_moves.append(move)
return random.choice(best_moves)

 Description  /

(1) Mode ab represents the Alpha-Beta pruning algorithm.

(2) Obtain the options on the current board that comply with the game rules.

(3) Store the best move option.

(4) best_score stores the highest option value obtained during the search process on the current disk, and this value will be continuously replaced by higher values ​​during the search process. Initialize the Beta values ​​of the executive ○ party and the executive × party to the lowest value, and update it later with the best_score value searched.

(5) Search options one by one.

(6) Simulate the move of the current option.

(7) Simulate the best value your opponent can get at the current move.

(8) The best value one can get is the opposite of the best value the other side can get.

(9) Only the move whose current value is higher than the existing record is processed.

(10) A higher value is found, so the best move needs to be updated.

(11) Update the recorded best move value.

(12) Update the best value to the actual player on the current board surface.

(13) If the searched value is consistent with the highest recorded value, only the optional range of the best moves will be supplemented. By randomly selecting high-value moves, the chess game will be more varied and closer to human behavior.

The minimization and maximization algorithms of Code Snippet 3-11 and Code Snippet 3-8 are very similar in framework. If readers think carefully, they will find that although the qualitative description of the algorithm seems a bit mysterious, the implementation is Alpha-Beta There is no essential difference between the pruning algorithm and the minimization maximization algorithm, but the judgment of the outcome is replaced by a value judgment function. Since the Alpha-Beta pruning algorithm is an optimization of the minimax algorithm, it can only be implemented recursively. The alpha_beta_prune() function is the core of the entire recursive method. Readers can compare bestResultForOP() in the minimax algorithm with this alpha_beta_prune(). Code snippet 3 demonstrates how the core recursive method of the algorithm is implemented.

[Code Snippet 3] Find the opponent's optimal move through the branch reduction algorithm.

MyGo\tic-tac-toe\ttt.py
max depth =4
def alpha_beta_prune(game, max_depth,best_o,best_x,evl_fn):
if game.state == GameState.over :
if game.winner == game.player:
return maxValue
elif game.winner == None:
return 0
else:
return minValu
elif max depth == 0:
return evl fn(game)
best so_far = minValue
for move in game.getLegalMoves():
next_game = game.simuApplyMove(move):
op_best_result = alpha_beta_prune(
next_game,max_depth - 1,
best_o,best_x,
evl_fn)
my_result = -1 * op_best_result
if my_result > best_so_far:
best_so_far = my_result
if game.player == plaver_o:
if best_so_far > best_o:
best_o = best_so_far
outcome_for_x = -1 * best_so_far
if outcome_for_x < best_x:
break
elif game.player == player_x:
if best_so_far > best_x:
best_x = best_so_far
outcome_for_o = -1 * best_so_far
if outcome_for_o < best_o:
break
return best_so_far

 Description  /

(1) Control the search depth. Since the value of artificially defined draws and chess games in progress is set to 0, and Tic Tac Toe has a total of 9 moves, so when the search depth is set relatively shallow, there is no difference between the first few steps of the algorithm and random moves. If the random drop method completes the horizontal and vertical connections in the first 3 steps, it can beat the pruning algorithm. This also illustrates the importance of a good value evaluation algorithm for the pruning algorithm from the side.

(2) Since the value evaluation function is used to evaluate the possibility of winning or losing, a very large number or a very small number is used here to indicate a clear victory or defeat.

(3) Control the search depth. If the game is not over after reaching a certain depth, use the value of the value evaluation function to replace the judgment of victory or defeat.

(4) Same as the minimax algorithm, initialize the best value that can be obtained from the current disk.

(5) These steps are almost the same as those in bestResultForOP().

(6) Update the best value if the result is better than previously recorded. The best value in the minimax algorithm is to win the game, so there is no step of updating the best value. In Alpha-Beta pruning, because the result of the outcome is estimated through the value evaluation function, this value may have many Different values, so it may need to keep updating the largest value.

(7) According to who the current chess player is, update the best value obtained in the previous step to the best value of different objects.

(8) If the current player is a ○ party, and the current search value is greater than the maximum value recorded by the ○ party, update the maximum value recorded. Similar operation steps are also used for the judgment of the painting × square below, and will not be repeated here.

(9) One side's best counterplay is the other side's worst.

(10) If the current best operation of one side can reduce the best of the other side, then it can be considered that a winning move has been found and exit. Of course, you can continue to search without exiting, but since you have already found it, look for a few more moves. This is of little significance, but it wastes computing resources. This also has a similar corresponding operation in bestResultForOP().

(11) Returns the best result possible for the current player.

When searching for options, the algorithm will search according to the order of available pieces on the board. If you happen to find a best option at the beginning, when searching for other subsequent options, it will be quickly pruned because the remaining options are less profitable. If you are unlucky, the best option will be pruned at the end. After searching, the speed of the Alpha-Beta pruning algorithm will not be faster than the minimization maximization algorithm. But in terms of mathematical expectations, the time consumed by the Alpha-Beta pruning algorithm will be half that of the minimax algorithm. If some heuristic algorithms are introduced to search from the most potential moves before the search starts, it may be possible to alleviate the algorithm's dependence on the search order of moves and make the algorithm greatly improved.

02. Book delivery activity

The book delivery activity of this issue is exclusively sponsored by Machinery Industry Press. The book delivery list of this issue is as follows:

"Design Patterns: Fundamentals of Reusable Object-Oriented Software (Collector's Edition)"

 

 

"Object-Oriented Thinking Process (5th Edition)"
"Engineering Thinking (5th Edition)"

 

 

The Soul of Creativity: Innovative Thinking for Engineers

 

 

"User Experience Elements: User-Centered Product Design (2nd Edition)"

 

The Original Book of Web and Mobile Usability Design Secrets, 3rd Edition

 

"Data Structure and Algorithm Analysis C Language Description (2nd Edition of the Original Book) Collector's Edition"

 

"Data Structure and Algorithm Analysis: Java Language Description (The 3rd Edition of the Original Book)"

 way of participation:

Three consecutive articles and comments on any content related to the article can participate in the lottery. After 48 hours, the program will automatically draw a lottery and send out 6 technical books [as above]! I hope you will participate and keep learning!

Guess you like

Origin blog.csdn.net/qq_41640218/article/details/131575402