Artificial Intelligence: A Modern Approach Chapter 5 Adversarial Search

Preface

Artificial Intelligence: A Modern Approach Chapter 5 Adversarial Search

5.1 Game

  • Competitive environment: The goals of each agent are in conflict
  • In a narrow sense, the game in artificial intelligence usually refers to the game between two agents in a completely observable environment.
  • A zero-sum game in which turns are performed with deterministic actions

In game search, A* search efficiency is low due to the large search graph (tree). In order to improve search efficiency, pruning and heuristic evaluation functions are introduced. Pruning allows you to ignore parts that do not affect the final decision and reduce the search space; the heuristic evaluation function quickly guides the search direction by estimating the true utility value of the state. The comprehensive application of pruning and heuristic evaluation functions can speed up game search and find better solutions.
Please add image description

Two-player game
Two players: MAX goes first, MIN goes second

The 6 elements of a game problem:

  • s: status
  • Player(s): the chess player corresponding to state s
  • Actions(s): set of optional actions for state s
  • Result(s,a): Transfer model, the result of chess action
  • Terminal-Test(s): The state s satisfies the termination test and the game ends.
  • Utility(s,p): The utility function value of player p in the terminal state s

Once again, the two important parts of the search problem are how the state is defined and how the state is transferred.

Tic Tac Toe
Please add image description

Game search tree for tic-tac-toe game

Insert image description here

5.2 Decision optimization in games

Given a game tree, the optimal strategy can be determined by checking the minimum and maximum value of each node, denoted as MINIMAX(n). Assuming that two players always play according to the optimal strategy, then the minimum and maximum value of the node is the utility value of the corresponding state (for MAX). Obviously, the minimax value of the terminal state is its utility value itself. Furthermore, for a given choice, MAX likes to move to a state with a maximum value, while MIN likes to move to a state with a minimum value. So we get the following formula:

M i n M a x ( s ) : MinMax(s): MinMax(s)

  • U t i l i t y ( s ) , s is the terminal state Utility(s), s is the terminal stateUtili ty(s) s为终止状态
  • M a x a i n A c t i o n s ( s ) M i n M a x ( R e s u l t ( s , a ) ) , s 为 M A X 结点 Max_{a in Actions(s)}MinMax(Result(s,a)),s为MAX结点 MaxainActions(s)MinMax(Result(s,a))sMAX结点
  • M i n a i n A c t i o n s ( s ) M i n M a x ( R e s u l t ( s , a ) ) , s 为 M I N 结点 Min_{a in Actions(s)}MinMax(Result(s,a)),s为MIN结点 MinainActions(s)MinMax(Result(s,a))sMIN结点
5.2.1 Maximum and minimum value search
def minimax(node, maximizing_player):
    if node.is_terminal_node():
        return node.evaluate()  # 评估当前节点的得分

    if maximizing_player:
        best_value = float('-inf')
        for child_node in node.get_children():
            value = minimax(child_node, False)
            best_value = max(best_value, value)
        return best_value
    else:
        best_value = float('inf')
        for child_node in node.get_children():
            value = minimax(child_node, True)
            best_value = min(best_value, value)
        return best_value

Performance analysis

m is the maximum depth of the search tree, b is the average branching factor

  • Time complexity: O(b^m)
  • Space complexity:
    Generate all successor nodes at once: O(bm)
    Generate one successor node each time: O(m) )
5.2.2 Multiplayer game

Please add image description

The simplest way to implement it: let the function UTILITY return a vector

5.3 alpha-beta pruning

alpha= The best (i.e. maximum) choice of MAX found on the path so far
beta = The best or smallest choice of MIN found on the path so far value) selection
alpga-beta search continuously updates the values ​​of alpha and beta, and clips when the value of a node is worse than the current alpha of MAX or beta of MIN respectively. The remaining branch of this node (i.e. terminating the recursive call) applies this to the maximin search

def alpha_beta(node, alpha, beta, maximizing_player):
    if node.is_terminal_node():
        return node.evaluate()  # 评估当前节点的得分

    if maximizing_player:
        value = float('-inf')
        for child_node in node.get_children():
            value = max(value, alpha_beta(child_node, alpha, beta, False))
            alpha = max(alpha, value)  # 更新alpha值,记录最佳极大值选择
            if alpha >= beta:
                break  # 执行剪枝,停止继续搜索该节点的兄弟节点
        return value
    else:
        value = float('inf')
        for child_node in node.get_children():
            value = min(value, alpha_beta(child_node, alpha, beta, True))
            beta = min(beta, value)  # 更新beta值,记录最佳极小值选择
            if alpha >= beta:
                break  # 执行剪枝,停止继续搜索该节点的兄弟节点
        return value

Ordinary MaxMin algorithm: O(b^m)
Best case of alpha-beta pruning: O(b^(m/2))

5.4 Real-time decision-making

Game search usually requires chess to be played within a reasonable time range, and the depth limit of the game search tree needs to be increased
Take alpha_beta as an example

def alpha_beta(node, depth, alpha, beta, maximizing_player):
    if depth == 0 or node.is_terminal_node():
        return node.evaluate()  # 评估当前节点的得分

    if maximizing_player:
        value = float('-inf')
        for child_node in node.get_children():
            value = max(value, alpha_beta(child_node, depth - 1, alpha, beta, False))
            alpha = max(alpha, value)  # 更新alpha值,记录最佳极大值选择
            if alpha >= beta:
                break  # 执行剪枝,停止继续搜索该节点的兄弟节点
        return value
    else:
        value = float('inf')
        for child_node in node.get_children():
            value = min(value, alpha_beta(child_node, depth - 1, alpha, beta, True))
            beta = min(beta, value)  # 更新beta值,记录最佳极小值选择
            if alpha >= beta:
                break  # 执行剪枝,停止继续搜索该节点的兄弟节点
        return value
        

good evaluation function

  • Ranking of termination states by the evaluation function: a win is evaluated better than a draw, and a draw is evaluated better than a loser
  • The calculation of the evaluation function itself cannot be too time-consuming
  • For non-terminal states, the evaluation function value should be closely related to the winning probability
  • In practical applications, the linear weighting function of features is usually used: Eval(s)=w1f1(s)+w2f2(s)+…+wnfn(s)

5.5 Random game

BackgammonPlease add image description

The backgammon game tree must include chance nodes in addition to the MAX and MIN nodes. Chance nodes are represented by circles in Figure 5.11. The child nodes of each chance node represent possible dice results; each branch is marked with the number of dice and its probability of occurrence.
Please add image description

ExpectMinMax(s):

  • U t i l i t y ( s ) , s is the terminal state Utility(s), s is the terminal stateUtili ty(s) s为终止状态
  • M a x a E x p e c t M i n M a x ( R e s u l t ( s , a ) ) , s is the M A X node Max_{a}ExpectMinMax(Result(s,a)), s is the MAX nodeMaxaExpectMinMax(Result(s,a))sMAX结点
  • M i n a E x p e c t M i n M a x ( R e s u l t ( s , a ) ) , s is the M I N node Min_{a}ExpectMinMax(Result(s,a)), s is the MIN nodeMinaExpectMinMax(Result(s,a))sMIN结点
  • ∑ r P ( r ) E x p e c t M i n M a x ( R e s u l t ( s , r ) ) , s is a chance node ∑rP(r)ExpectMinMax(Result(s,r)), s is a chance noderP(r)ExpectMinMax(Result(s,r))sopportunity point

chapter summary

Adversarial search is a search algorithm for solving decision-making problems in competitive environments. Traditional adversarial search algorithms such as Minimax and Alpha-Beta pruning find optimal decisions by building game trees and backtracking pruning. Random adversarial search introduces random factors, considers the random behavior of the opponent, and increases the diversity of decision-making. Real-time adversarial search targets real-time decision-making problems by conducting partial searches within a limited time and weighing time and quality to find the best decision. These extended forms make adversarial search more flexible and adaptable to different competitive environments and needs, providing better decision-making strategies and performance optimization.

Guess you like

Origin blog.csdn.net/weixin_61197809/article/details/134266201