Gobang ai: the idea and implementation of the minimax search and α-β pruning algorithm (qt and c++) (3) the minimax search and α-β pruning algorithm

Now we come to the core part of Gobang ai: minimax search and α-β pruning algorithm. These two things sounded very tall, but after I actually realized it, I realized that it turned out to be the same.

1. Minimax search

What is a minimax search? I will first introduce the concept of game tree . The game tree is a tree structure formed when oneself and an opponent make decisions. The branches of each node represent various possible positions that the current node can take, and each leaf node represents a situation. For example, starting from an empty chessboard (the root node), I make a move. I have 15x15=255 kinds of move options. After I move, a game tree with 255 nodes is formed. The leaf nodes of the game tree are a situation. . If the opponent makes a move at this time, and the opponent has 254 choices, then the newly formed game tree will have 255x254 leaf nodes.

It can be seen that the game tree is at an exponential level . If the average branch is b and the number of game tree levels is d (the root node is the 0th level), then the total number of leaf nodes is about b^d.

With the concept of game tree, we can realize "smart" ai that can see a few steps. So how do you "look at a few steps"? Simply put, it is to traverse all the leaf nodes of the game tree, find the most favorable situation for ai, and then make a move. In the game tree, when ai moves chess, chooses the most favorable position node to move, and when the player moves chess, it is ai simulates that the player chooses the most advantageous position node to the player. Since the evaluation function F scores the situation for ai, the node with the largest F is selected when ai moves, and the node with the smallest F is selected when simulating a player's move (the smaller F is, the more unfavorable it is for ai, and the more it is for the player. Advantageously, AI thinks that the player is "smart" when it simulates the player), which is why it is called minimax search .

2. Minimax search optimization

1.α-β pruning algorithm

The name of this algorithm sounds very big, but the actual connotation is not difficult to understand.

Give a very simple example.
Insert picture description here
The max layer indicates that the value of this layer of nodes should select the maximum value of its child nodes, and the min layer indicates that the value of this layer of nodes should select the minimum value of its child nodes.

For such a game tree, knowing the values ​​of leaf nodes d, f, g, and h, how to find the value of a? First, from d and f, we can know that the value of b is -1, and then a has searched for node b, searched for the other side, and searched for acg. At this time, we can know that the value of c must be <=-2, because a has to choose the maximum value For node, the value of b is already greater than c, and there is no need to search for node c, then the branch of ach is "cut off".

In the α-β pruning algorithm, each node corresponds to an α and a β. α represents the best lower bound of the node at present, and β represents the best upper bound of the node at present. At the very beginning, α is negative infinity and β is positive infinity. Then the search is performed. Each time a node of the max layer searches for one of its child nodes, it must update its own α (lower bound), and each time a node of the min layer searches for one of its child nodes, it must update its β (upper bound). If it is found that α>=β after the update, it means that the subsequent child nodes no longer need to be searched, and they are broken and pruned directly. This is the full meaning of the α-β pruning algorithm.

My code:

struct POINTS{
    
    //最佳落子位置,[0]分数最高,[9]分数最低
    QPoint pos[10];
    int score[10];//此处落子的局势分数
};
struct DECISION{
    
    
    QPoint pos;//位置
    int eval;//对分数的评估
};

DECISION decision;//analyse得到的最佳落子点
    
int chessAi::analyse(int (*board)[15], int depth, int alpha, int beta){
    
    
    gameResult RESULT=evaluate(board).result;
    if(depth==0||RESULT!=R_DRAW){
    
    //如果模拟落子可以分出输赢,那么直接返回结果,不需要再搜索
        if(depth==0)
            POINTS P;
            P=seekPoints(board);//生成最佳的可能落子位置
            return P.score[0];//返回最佳位置对应的最高分
        }else return evaluate(board).score;
    }else if(depth%2==0){
    
    //max层,我方(白)决策
        int sameBoard[15][15];
        copyBoard(board,sameBoard);
        POINTS P=seekPoints(sameBoard);

        for(int i=0;i<10;++i){
    
    
            sameBoard[P.pos[i].x()][P.pos[i].y()]=C_WHITE;//模拟己方落子,不能用board,否则可能改变board的信息
            int a=analyse(sameBoard,depth-1,alpha,beta);
            sameBoard[P.pos[i].x()][P.pos[i].y()]=C_NONE;//还原落子
            if(a>alpha){
    
    
                alpha=a;
                if(depth==4){
    
    //4是自己设立的深度(可以改为6,8,但必须为偶数),用来找最佳落子
                    decision.pos.setX(P.pos[i].x());
                    decision.pos.setY(P.pos[i].y());
                    decision.eval=a;
                }
            }
            if(beta<=alpha)break;//剪枝
        }
        return alpha;
    }else{
    
    //min层,敌方(黑)决策
        int rBoard[15][15];
        reverseBoard(board,rBoard);
        POINTS P=seekPoints(rBoard);//找对于黑子的最佳位置,需要将棋盘不同颜色反转,因为seekPoint是求白色方的最佳位置

        int sameBoard[15][15];
        copyBoard(board,sameBoard);

        for(int i=0;i<10;++i){
    
    
            sameBoard[P.pos[i].x()][P.pos[i].y()]=C_BLACK;//模拟敌方落子
            int a=analyse(sameBoard,depth-1,alpha,beta);
            sameBoard[P.pos[i].x()][P.pos[i].y()]=C_NONE;//还原落子
            if(a<beta)beta=a;
            if(beta<=alpha)break;//剪枝
        }
        return beta;
    }
}

seekPoints() is used to find the positions of the best placement points in the current situation and the scores after the placement. Here, local search and static evaluation heuristics are used to improve efficiency, which will be discussed later.

There are a few points that are very easy to make mistakes (it took me half a week to change the bug), and you need to pay attention:
1. The search depth of the game tree must be an even number , because the position evaluation function F of the leaf node is evaluated for the situation that Bai Zi has taken a step, if If the depth is an odd number, it will cause the F estimation of the leaf node to be wrong.
2. When you call analyze recursively, you cannot use the original board checkerboard array , because analyze will simulate a move. If you simply simulate a move on the board, the board information will be changed, and the subsequent recursive call will continue to modify the board, so that it will not Get the correct result. A good practice is to copy a new chessboard array, then simulate the move, and then pass it as a parameter to the recursive analyze.

2. Local search and static evaluation inspiration

Since the game tree is exponential, if we can find a way to reduce the average number of branches b, the number of leaf nodes can be greatly reduced.

Local search means that only those empty positions that can have a relationship with the chess pieces are considered, and all empty positions are not considered, which can greatly reduce the value of b . My local search considers the extension of 3 depths in 8 directions around each point on the chessboard, and only these places will be related to the existing chess pieces.

The static evaluation heuristic is for the α-β pruning algorithm, which means that the earlier the better move is found, the earlier the pruning will occur. If the evaluation scores of walkable nodes are simply sorted, the pruning speed can be improved. I sorted the scores of all walkable nodes from the local search and placed them in an object of the POINTS class, [0] had the highest score and [9] had the lowest score.

seekPoints() is a function that combines these two optimization methods to generate the optimal placement position . Generally speaking, searching for 10 optimal moves is enough to meet the demand, which can greatly improve the search speed, but if it is too small It is possible to cut off favorable branches . Then, if the search depth is 4, only 10^5=100000 leaf nodes need to be searched at most, and the static evaluation heuristic and α-β pruning algorithm are further reduced. The actual program running only needs to search 5000 leaf nodes, which reduces That's a lot!

struct POINTS{
    
    //最佳落子位置,[0]分数最高,[9]分数最低
    QPoint pos[10];
    int score[10];//此处落子的局势分数
};

POINTS chessAi::seekPoints(int board[15][15]){
    
    
    bool B[15][15];//局部搜索标记数组
    int worth[15][15];
    POINTS best_points;

    memset(B,0,sizeof (B));
    for(int i=0;i<15;++i){
    
    //每个非空点附近8个方向延伸3个深度,若不越界则标记为可走
        for(int j=0;j<15;++j){
    
    
            if(board[i][j]!=C_NONE){
    
    
                for(int k=-3;k<=3;++k){
    
    
                    if(i+k>=0&&i+k<15){
    
    
                        B[i+k][j]=true;
                        if(j+k>=0&&j+k<15)B[i+k][j+k]=true;
                        if(j-k>=0&&j-k<15)B[i+k][j-k]=true;
                    }
                    if(j+k>=0&&j+k<15)B[i][j+k]=true;
                }
            }
        }
    }

    for(int i=0;i<15;++i){
    
    
        for(int j=0;j<15;++j){
    
    
            worth[i][j]=-INT_MAX;
            if(board[i][j]==C_NONE&&B[i][j]==true){
    
    
                board[i][j]=C_BLACK;
                worth[i][j]=evaluate(board).score;
                board[i][j]=C_NONE;
            }
        }
    }

    int w;
    for(int k=0;k<10;++k){
    
    
        w=-INT_MAX;
        for(int i=0;i<15;++i){
    
    
            for(int j=0;j<15;++j){
    
    
                if(worth[i][j]>w){
    
    
                    w=worth[i][j];
                    QPoint tmp(i,j);
                    best_points.pos[k]=tmp;
                }
            }
        }
        best_points.score[k]=w;
        worth[best_points.pos[k].x()][best_points.pos[k].y()]=-INT_MAX;//清除掉上一点,计算下一点的位置和分数
    }
    return best_points;
}

Concluding remarks

So here, the realization of Gobang ai is basically over. When the depth is 4, the performance is very good (less than 0.5 seconds), and when the depth is 6, the operation is slower (8 seconds in the front, and fast in the back, only 2 or 3 seconds). Looking at the projects of other big guys, they have also implemented deeper technologies such as in-depth iteration calculation, zobrist cache optimization and multi-threading, but we are now starting online classes. The unity project that we lost before is almost forgotten. In short, there is a big thing. Heap, then the thing to continue to optimize is to talk about later when you have time.

I have been working on this project for more than a week. Looking back now, I have made a lot of detours and learned a lot of experience and lessons. I am writing a blog here to share the valuation function that I have worked hard to figure out. Others can step on a few pits less, after all, it is quite hard to find information by yourself.

Guess you like

Origin blog.csdn.net/livingsu/article/details/104544562