Artificial Intelligence A Modern Approach Chapter 5 against search

Game

Search antagonism : competitive environment there is a conflict between the objectives of each Agent, it has become a game.
Game : There is complete information, deterministic, turns action, two players zero-sum game.
Pruning : Ignore the final decision does not affect those parts of the search tree.
Heuristic evaluation function : to estimate the true value of some utility in the state without a full search.
Utility function (the objective function, the function returns) the UTILITY (S, p) : p value is defined in the termination of the player state.

Optimization of the decision-making game

Consider a Max and Min two people playing chess problem:
the optimal solution: a series of actions to reach the target state
to suspend state: one party to win
Minimax value : two people can be understood as MAX and MIN, MAX like to move to a very where a large value, MIN like to move to place the minimum value, the terminal state was evaluated by the utility function. MAX point to the highest state has suspended the maximum value is minimal. Assume that the adversary decisions are optimal.

Figure above, the termination node is the utility value of Max; target node is the number next to the other minimax value of Max and Min. In the preferred line moves Max is the root node a1, because it points to has the highest maximum value is minimal successor, and Min at this time is the best line moves b1, because it has the lowest minimum point to the maximum value of the successor .

Given a game tree, the optimal strategy can be determined by checking each node minimax values, referred to as MINmAX (s). MAX like to move to a place of great value, MIN like to move to where there is a minimum value. The following formula can be obtained based on the game tree.

Minmax algorithm (Minimax algorithm)

Minimax minimum decision algorithm greatly from the current state. It uses a simple recursive algorithm to calculate for each subsequent value Minimax. Recursive algorithm has been advanced to a top-down tree leaf node, recursively backtracking through the search tree Minimax value return.
Understand: every step to minimize the maximum benefit of the enemy.

Multiplayer optimal decisions when the game

More than two people using the game MINIMAX algorithm, due to the two previous studies of the reaction can score one of the other game score, it is only a numerical representation status score for multiplayer games, you should be used to replace a single utility value to value. Back to the node is the result of each player selected in the utility value vector successor node.

Multiplayer games usually involve happens formal or informal alliances between game players.

α-β pruning (Key)

Minimax search time complexity increases exponentially, α-β pruning complexity can be halved in many cases can be trimmed entire subtree.

In the parent node of a node have a better choice, will not explore to the node.

  • α: the best choice for the MAX so far found path (maximum value)
  • β: the best choice for the MIN (minimum value) so far on the path

Application of this technique to a standard minimax tree search, snip off a branch can not influence those decisions. Then stop exploring smaller than α, β than the big stops to explore.

Examples of an α-β pruning
objectives: Analyzing the figure is the root node of the decision node which come in BCD.

Follows

The child node selects a minimum B value 3, i.e., the value of B is at most 3, returns to the A, [alpha] 3 becomes a value of A, i.e. the root value of at least 3

C following a first leaf node is 2, represents C Min node value of at most 2, but already know the value of the point B is 3, and therefore will not choose Max C. We can not consider the other nodes in C, can prune

A first D The following two values ​​are 14,5 node. Therefore, the search should be carried out. The third node is 2, so the value of Min node D is at most 2, so Max node (root) of the node B go decision is a value of 3.

The above process can be seen as MinMax simplified formula. If the value of the child node C is not the process of calculating two are x and y. Root value calculated as follows:

In the following cases, α-β of maximum gain:

  • 如果以增加备份值的方式对Min节点的Max子节点进行升序排序,则效果更好。例如,对D这个Min结点的子节点进行从小到大排序2、5、14,则可以不计算结点5、14。
  • 如果对Max结点的Min结点进行降序排序,则效果更好。

这种情况下,α-β算法可以裁掉其中的一大部分,只需要检查O(b^(m/2))个结点来做出决策,其中,b是决策深度
极小极大算法搜索整个博弈空间,需要检查的结点数为O(b^m)
如果后继状态采用随机顺序而不是最佳优先的顺序,那么α-β算法需要检查的总结点数大约是O(b^(3m/4 )),采用最佳优先排序时只需检查O(b^(m/2))个结点

不完美的实时决策

由于实时决策允许的模拟时间很短,故需要尽早截断搜索、将启发式评估函数用于搜索。
启发式评估函数EVAL(可以评估中间状态)–替代效用函数
用决策什么时候运用EVAL的截断测试(cutoff test)–替代终止测试

评估函数

①对终止状态的排序应该和真正的效用函数的排序结果一样
②评估函数的计算成本不能花费太长时间
③对于非终止状态,评估函数应该和取胜几率密切相关

截断搜索

最简单的节点搜索:设置固定的搜索深度
地平线效应:当好棋出现在固定搜索深度之后时,无法探索到好棋

向前剪枝

向前剪枝:在某节点上无需进一步考虑而直接剪枝一些节点
柱搜索:只考虑最好的n步行棋,但无法保证最好的行棋不被剪枝掉

资源分享

实验代码下载:
https://github.com/yyl424525/AI_Homework
人工智能-一种现代方法中文第三版pdf、课件、作业及解答、课后习题答案、实验代码和报告、历年考博题下载:https://download.csdn.net/download/yyl424525/11310392

Guess you like

Origin blog.csdn.net/yyl424525/article/details/95308101