What is α-β pruning algorithm?

IBM's deep blue victory over chess master Kasparov is largely due to the α-β pruning algorithm [2], so what is the α-β pruning algorithm? We start with the minimax process.

1. Minimal-maximum process

Let's first look at how people play chess. When playing chess, a person first considers a number of total possible moves based on the current situation, then considers how the opponent will move for each possible move, and then considers how he will respond...A master will look forward many steps in this way, according to the final move The situation determines which move is the best. In other words, the master will choose the move that still has the greatest advantage even if the opponent responds correctly, instead of pinning their hopes on the opponent's mistake. This way of thinking of human chess players can be described by a " minimum-maximum process ", where "minimum" means that the opponent's correct response minimizes one's own benefit, while "great" means that the opponent makes oneself Under the premise of the smallest profit, maximize your own profit through your own moves.

Figure 1: Minimax process

What is α-β pruning algorithm?

As shown in Figure 1, wherein the box represents that it is our turn to play chess, and the circle represents that it is the other side's turn to play chess. The top box indicates the current game state. Search 4 steps down from the current state, the bottom number gives the score of the game after 4 steps, the larger the number, the more favorable it is to our side, and the smaller the number, the more favorable it is to the opponent (in the deep blue system, these scores From relevant knowledge summarized by chess masters). With this search tree, the score of each node can be deduced from bottom to top. How to push it backwards? Obviously, under the premise that neither side makes mistakes, we will choose the move with a higher score, and the opponent will choose the move with the lowest score. Based on this principle, the score of a circle node is the minimum score of all its child nodes, while the box node is the maximum score of all child nodes. Thus, the scores of all nodes can be obtained from bottom to top, as shown in Figure 1.

Based on the scores of each node shown in Figure 1, it can be seen that if there is no error on both sides, if you go left, you will get 0 points, and if you go right, you will get 1 point. Obviously, we should choose to go to the right, as shown by the red arrow, which will ensure that no matter how the opponent responds, we will get at least 1 point of advantage. This is the minimum-maximum process. Essentially, this process guarantees that in the worst case (the opponent's perfect move) choose the most advantageous move for oneself, so it can show stable chess strength. The min-max process was proposed by Shannon in 1949 [3], and it is the basis of many computer game systems including Deep Blue.

2. Alpha-beta pruning

When the search depth increases, the minimax process will generate a large-scale search tree, and the problem of "combinatorial explosion" will appear. According to the estimates of the developers of Deep Blue, if no improvement is made, even if each move is only considered about ten steps forward, each move needs to be "thought" for 17 years.

How to solve this problem? Let's first look at how human chess players deal with it. As we all know, when an experienced chess player thinks about possible moves, he does not consider every possibility equally, but chooses several possible moves to try based on his own experience. The computer can learn from this idea and reduce some unnecessary path branches in the search process to improve the search efficiency. This scheme is called pruning. Taking advantage of the characteristics of the minimal-maximum process (our side chooses the largest child node, and the other side chooses the smallest child node), a pruning algorithm can be designed to remove a large number of unnecessary search paths under the premise of ensuring that the decision remains unchanged. The α-β pruning algorithm is just such an algorithm.

We assume that nodes are generated while searching in a depth-first manner, as shown in Figure 2. Starting from the root node s, three nodes a, b, and c are sequentially generated. After reaching the search depth, calculate the scores of the two child nodes of c. Since c is a very small node, the minimum score (0) among the two child nodes is the score of c. Since b is a maximal node, the minimum score of b is 0. Extending downward from b generates nodes d and e, the score of e is -3, and d is a very small node, so we know that the maximum value of d is -3. At this point we will find that the minimum value of b is 0, and the maximum value of d is -3, so the score of node f is not important, so f can be cut off and does not need to be generated. This way with a score of 0 by b, we have a maximum value of 0. Similarly we expand another child node of a, and extend down to k, and get a score of 3 for k. Since h has no other children, we infer that the maximum node g is at least 3. From the previous we know that the maximum value of the minimum node a is 0, and its child node g is a maximum node, and the minimum value is 3, so the score of other child nodes of g is not important, and it does not need to be generated.

Figure 2: Schematic diagram of α-β pruning

What is α-β pruning algorithm?

The above example can be summarized into the following pruning rules:

1. When the value of the smallest node of the descendant is less than or equal to the value of the largest node of the ancestor, pruning occurs, which is called α pruning.

2. When the value of the descendant maximum node ≥ the value of the ancestor minimum node, pruning occurs, which is called β-pruning.

Please note that the condition for pruning to occur here is to compare the descendants with the ancestors, not just compare with the parent node, as long as there is an ancestor that meets the pruning conditions, pruning will occur.

The above pruning method is called α-β pruning algorithm , which was proposed by John McCarthy, one of the founders of artificial intelligence and winner of the Turing Award (many scholars have done related research and independently proposed the algorithm during the same period)[1] . The α-β pruning algorithm significantly improves the search efficiency, allowing deeper searches in a limited time, resulting in better performance. It should be emphasized that the α-β pruning algorithm is a "lossless" algorithm, that is, pruning will only improve the search efficiency and will not affect the decision-making of chess.

Guess you like

Origin blog.csdn.net/weixin_48827824/article/details/119953138