Artificial Intelligence: A Modern Method Study Notes (Chapter 5)-Confrontation Search

Game concept

Insert picture description here

MinMax algorithm

Minimax algorithm is often used in two-player game, the purpose is to find the best solution to maximize their benefits. The basic idea is to assume that oneself (A) is smart enough to always choose the most favorable plan, while the opponent (B) is also smart enough to always choose the most unfavorable plan for A.

Let's take an example to illustrate:
suppose: the square represents oneself (A), the circle represents the opponent (B), and each child node of the node represents a candidate solution. The
Insert picture description here
above figure shows all the candidate solutions. Let us analyze as follows: (Note: all numbers in the figure are the benefit value of A, the larger the value, the better A)

Suppose A chooses the first option, B has two candidate solutions, and B in order to minimize the benefit of A, all Choose 3 out of 7 and 3, so A can only get 3

Insert picture description here
Suppose A chooses the second option, and B has only one option, and A can finally get 15.

Suppose that A chooses the third option, and B has 4 alternatives. In order to minimize A's benefit, B chooses the first option, and A can only get benefit 1.
In order to maximize his own interests, A will choose the first option, which is to obtain the benefit 15.

As can be seen from the above figure, B always chooses the minimum value in the candidate program, and A always chooses the maximum value in the candidate program, and the name Minimax is derived from this.

The algorithm uses Depth First Search to traverse the decision tree to fill in the benefit value of the intermediate nodes of the tree. The benefit value of the leaf nodes is usually calculated by a benefit evaluation function.

Usually the branches of a decision tree grow exponentially, so it is basically impossible to traverse the entire decision tree. Therefore, in practical applications, the depth of the decision tree is usually controlled to reduce the amount of calculation. Because it is unable to traverse the complete decision tree, the algorithm may be misleading, that is, the selected scheme may be locally optimal instead of global optimal.

Sometimes in order to get better results, the depth of the search tree has to be increased, which increases a lot of calculations. In order to speed up the calculation and reduce the amount of calculation, you can use Alpha Beta Pruning to prun the search tree. Because there are many branches in the search tree that do not need to be traversed.

Insert picture description here

Insert picture description here
Max wants to maximize his own interests, and Min wants to minimize his opponent's interests.
Zero-sum game: Under strict competition, the two parties involved in the game must have their gains corresponding to the losses of the other party, and the gains and losses of the two parties will add up. The sum is always zero
Insert picture description here

5.2.1 Minimax algorithm The
recursive algorithm advances from top to bottom to the leaf nodes of the tree, and recursive backtracking returns the minimum and maximum values through the search tree.
Each step is to minimize the enemy's maximum benefit.

5.2.2 The optimal decision in a multiplayer game is
more than two players. The MINIMAX algorithm is used. Since the score of one party can reflect the score of the other party in the two-player game previously studied, only one value is used to represent the state score. For multiplayer games, it should be Use vector values to replace single utility values. The return of each node is the result of the player's selection in the node's successor utility value vector. Multiplayer games usually involve formal or informal alliances between game players.

Game search experiment

class MinimaxAgent(MultiAgentSearchAgent):
    """
      Your minimax agent (question 2)
    """
    def getAction(self, gameState):
        "*** YOUR CODE HERE ***"
      #  util.raiseNotDefined()
        return self.MinMaxSearch(gameState,0,1)
    
    def MinMaxSearch(self,gameState,agentIndex,currentdepth):
        if currentdepth > self.depth or gameState.isWin() or gameState.isLose():
              return self.evaluationFunction(gameState)

        legalMoves = [action for action in gameState.getLegalActions(agentIndex) if action != 'Stop']

        # update next depth
        nextIndex = agentIndex + 1
        nextdepth = currentdepth 
        if nextIndex>=gameState.getNumAgents():
              nextIndex = 0
              nextdepth += 1
        results = [self.MinMaxSearch(gameState.generateSuccessor(agentIndex,action),nextIndex,nextdepth) for action in legalMoves]

        if agentIndex==0 and currentdepth==1:
              bestmove = max(results)
              bestIndex = [index for index in range(len(results)) if results[index]==bestmove]
              choose_index = random.choice(bestIndex)
              return legalMoves[choose_index]

        if agentIndex==0:
              bestmove = max(results)
              return bestmove
        else:             
              bestmove = min(results)
              return bestmove

α-β pruning algorithm

Detailed explanation of α-β algorithm Another reference for detailed
algorithm
Insert picture description here

Alpha-Beta pruning is used to cut the meaningless branches in the search tree that do not need to be searched to improve the calculation speed.

Assuming that α is the lower bound and β is the upper bound, for α ≤ N ≤ β:

If α ≤ β, then N has a solution.

If α> β, then N has no solution.

Let's use an example to illustrate the Alpha-Beta pruning algorithm.
Insert picture description here
The picture above shows the entire search tree. Here we use the Minimax algorithm with the Alpha-Beta pruning algorithm, the square is itself (A), and the circle is the opponent (B).
The initial setting α is negative infinity and β is positive infinity.

For B (the fourth layer), try to make A gain the smallest profit, so when encountering a situation that makes A gain less, you need to modify β. Here 3 is less than positive infinity, so β is modified to 3.
Insert picture description here
(Fourth layer) Here 17 is greater than 3, no need to modify β.

For A (third layer), the bigger the profit, the better. Therefore, when the profit value is greater than α, α needs to be modified, where 3 is greater than negative infinity , So α is modified to 3
Insert picture description here
B (the fourth layer) has a plan that makes A profit only 2, α=3, β=2, α> β, indicating that A (third layer) only chooses the second plan, then B It must be possible to make A's profit less than the first plan of A (third layer), so that there is no need to consider other candidate plans of B (fourth layer), because A (third layer) will not choose at all The second plan is a waste of consideration. If
Insert picture description here
B (second layer) wants to minimize the benefit of A, then the second plan of B (second layer) cannot make A's profit greater than β, which is 3. But if B (Second layer) Choose the second plan, A (third layer) can choose the first plan to make A profit 15, α=15, β=3, α> β, so there is no need to consider A (the first The second plan of the third layer), because B (second layer) will not choose the second plan.
Insert picture description here
A (first layer) maximizes its own interests, that is, the second plan of A (first layer) cannot be worse Compared with the first plan, but a plan of A (third layer) will result in a benefit of 2, which is less than 3, so A (third layer) will not choose the first plan, so B (fourth layer) does not need to be considered The second option.
Insert picture description here
When A (third layer) considers the second plan, it is found that the benefit is 3, which is the same as the benefit of using the first plan for A (first layer). If A (first layer) is selected according to the above analysis first One plan, then B no longer needs to consider the second plan. If A (the first layer) wants to further evaluate the pros and cons of the two plans, B (the second layer) needs to consider the second plan. The second plan of B (second layer) makes A gain less than 3, then A (first layer) can only choose the first plan, if the second plan of B (second layer) makes A gain more than 3. Then A (first layer) also needs to consider which solution is ultimately selected based on other factors.

Insert picture description here

α value: The maximum return value that the Max node can currently get
β value: The minimum return value
α and β that the Min node can currently give to the opponent are initialized to -∞ and +∞, respectively

import sys
class AlphaBetaAgent(MultiAgentSearchAgent):
    """
      Your minimax agent with alpha-beta pruning (question 3)
    """

    def getAction(self, gameState):
        """
          Returns the minimax action using self.depth and self.evaluationFunction
        """
        "*** YOUR CODE HERE ***"
        #util.raiseNotDefined()
        return self.AlphaBeta(gameState,0,1,-sys.maxint,sys.maxint)
    
    def AlphaBeta(self,gameState,currentAgent,currentDepth,Alpha,Beta):
          if currentDepth > self.depth or gameState.isWin() or gameState.isLose():
                return self.evaluationFunction(gameState)
          
          legalMoves = [action for action in gameState.getLegalActions(currentAgent) if action!='Stop']
          nextAgent = currentAgent + 1
          nextDepth = currentDepth
          if nextAgent >= gameState.getNumAgents():
                nextAgent = 0
                nextDepth += 1
          

          if currentAgent == 0 and currentDepth == 1:
                results = [self.AlphaBeta(gameState.generateSuccessor(currentAgent,action),nextAgent,nextDepth,Alpha,Beta) for action in legalMoves]
                bestMove = max(results)
                bestIndex = [index for index in range(len(results)) if results[index]==bestMove]
                ultimate_index = random.choice(bestIndex)
                return legalMoves[ultimate_index]
          
          if currentAgent == 0:
                bestMove = -sys.maxint
                for action in legalMoves:
                      bestMove = max(bestMove,self.AlphaBeta(gameState.generateSuccessor(currentAgent,action),nextAgent,nextDepth,Alpha,Beta))
                      if bestMove >= Beta:
                            return bestMove
                      Alpha = max(Alpha,bestMove)
                      return bestMove
          else:
                bestMove = sys.maxint
                for action in legalMoves:
                      bestMove = min(bestMove,self.AlphaBeta(gameState.generateSuccessor(currentAgent,action),nextAgent,nextDepth,Alpha,Beta))
                      if bestMove <= Alpha:
                            return bestMove
                      Beta = min(bestMove,Beta)
                      return bestMove