Pac Man: comprensión y aplicación de algoritmos de búsqueda de IA (1)

Autor: encender fuego

Correo electrónico: [email protected]

UC BerkeleyLos cursos ofrecidos CS188: Introduction to AItienen una estructura clara y un contenido detallado, AIlo que los convierte en la mejor opción para comenzar. Como tarea de su capítulo de algoritmos de búsqueda, Pac Manes único e ingenioso en diseño, y es digno de investigación y reflexión repetidas. Teniendo en cuenta que hay pocos artículos relacionados con Nuggets Rare Earth, el autor planea publicar una serie de artículos para analizar Pac Mancada tarea en detalle, a fin de facilitar a los lectores el aprendizaje y la comprensión de la inteligencia artificial y los algoritmos de búsqueda.

Después de obtener el código fuente, lo primero que debe comprender es la estructura general del proyecto:

assetsLa carpeta se utiliza para almacenar recursos estáticos.En el estado inicial, solo hay una demo.pngimagen en ella.

layoutsLas carpetas se utilizan para almacenar recursos de mapas y nos permiten crear nuestros propios mapas para realizar pruebas. Para un .layarchivo específico, contiene principalmente los siguientes elementos:
- %: representa la pared
- .: Frijoles que pueden sumar puntos
- o: Cápsula que puede hacer que los monstruos entren en pánico
- P: La posición inicial del jugador (Pac-Man)
- G: La posición inicial del monstruo (admite la colocación de múltiples monstruos)
Al colocar las posiciones de los elementos anteriores, podemos crear un mapa nosotros mismos. Para juegos pequeños, esta forma de usar archivos de texto para almacenar mapas es bastante común .layy , .txten esencia, no es diferente de los normales.
Archivos de código que necesitamos leer:
- $utils.py:$ Implementaestructuras de datos comoStack,Queue,PriorityQueue,PriorityWithFunction,CounterLaPriorityQueueimplementación se basa en un pequeño montón superior y su elemento interno es un triple, pero solo debemos centrarnos en el elementoitemy su prioridadpriority. PriorityQueueWithFunctionSe heredaPriorityQueue, lo que permite a los usuarios pasar funciones de evaluación personalizadas.
- $pacman.py:$ define unaGameStateclase y proporciona una serie de interfaces, a través de las cuales no solo puede conocer la ubicación y la cantidad de Pac-Man y monstruos, sino también obteneragentel subestado generado después de realizar una acción específica (esto es muy importante en los problemas del juego ).Lo esencial). Por supuesto, los alimentos, las cápsulas y las puntuaciones también admiten el acceso. Entonces, en general, a travésGameStatede la interfaz de la clase, puedes obtener el estado completo del juego.
- $juego.py:$ DefinePac Manalgunas clases básicas del juego, y las partes que necesitan ser leídas han sido marcadas en el código fuente. Entre ellos, se debe prestar especial atención a laGridclase, que se utilizará al agregar heurísticas más adelante y reescribir la función de evaluación.
Los archivos de código que necesitamos escribir:
- $buscar.py:$ 在此，我们应当实现DFS、BFS、UCS、A*算法，并将其应用于寻径问题。
- $searchAgents.py:$ 在此，我们应自定义启发式函数，并针对两个具体的迷宫，通过修改代价函数，让吃豆人尽可能地获取高分。
- $multiAgents.py:$ 在此，我们应当实现Minimax算法、Alpha-Beta剪枝、并修改评价函数，最终完成一款智能的吃豆人小游戏。

倘若你对上述的一些名词感到陌生，不要担心，我们在后文中会由浅入深，更为详细地讲解算法原理和代码结构。

暂不考虑怪物，分别实现DFS、BFS、UCS、A*四种搜索算法，让Pac Man吃到迷宫里的一个食物。

该任务需要我们在search.py中进行代码的编写。实际上，该文件已经声明了如上四个函数，它们应当返回一个动作序列，吃豆人会依据这个动作序列进行活动。

四个函数都需要接收一个problem参数，这个problem实则就是searchAgents.py中PositionSearchProblem类的一个对象。通过它，我们可以获知吃豆人的当前状态及是否到达了终点。

根据源码注释的提示，通过打印problem.getStartState()，我们发现所谓的state，指的就是吃豆人当前所处的位置(x,y)。考虑到吃豆人移动的灵活性，我们应当使用图搜索，引入探索集避免展开同一节点。

深度优先搜索

def depthFirstSearch(problem):
    explored = set()
    result = util.Stack()
    frontier = util.Stack()

    result.push([])
    frontier.push(problem.getStartState())

    while True:
        if frontier.isEmpty():
            return []

        node = frontier.pop()
        action = result.pop()

        if problem.isGoalState(node):
            return action

        explored.add(node)
        children = problem.expand(node)

        for child in children:
            if child[0] not in explored and child[0] not in frontier.list:
                frontier.push(child[0])
                result.push(action + [child[1]])
复制代码

四种搜索算法都可以通过上述模式进行实现，只是采用的数据结构不同。对于DFS，我们习惯将其写成递归形式，这本质上是在利用程序栈。倘若我们利用迭代来实现DFS，则需要手动开一个Stack模拟程序栈的行为。
本题的难点在于如何有效地记录搜索路径，因为我们最终需要返回的是一个动作序列，该序列应当指导吃豆人自起点移动至终点。通过阅读problem.expand函数的源码，可知该函数的返回值为一个list，而list中的每个元素是一个(child, action, stepCost)三元组，其中action就代表自parent移动至child所需要采取的步骤，这就是我们需要记录的。因此，一种直截了当的做法是，就把这个三元组加入到frontier中，然后逐层维护action，让其代表从起点开始移动该位置所需要的步骤。这个方法是通用的，我们会在UCS中采用该做法。
不过，此处我们使用了一个额外的result栈，用于追踪frontier的进出。实际上，记录路径的核心点，就在于我们要将parent的一部分内容移到child中来，然后再加上从parent怎么到的child，路径就记录好了。这也是代码中result.push(action + [child[1]]的含义，action就是此前parent的内容，代表自起点如何到达parent，[child[1]]则表示parent到child的方法，将两者拼接起来，就是自起点到达当前child的动作序列。
将result的数据类型选为和Stack，就可以同步frontier中元素的进出栈过程，保证当最终状态被搜索到后，result pop出的action也是自起点到达终点的路径。

宽度优先搜索

def breadthFirstSearch(problem):
    explored = set()
    result = util.Queue()
    frontier = util.Queue()

    result.push([])
    frontier.push(problem.getStartState())

    while True:
        if frontier.isEmpty():
            return []

        node = frontier.pop()
        action = result.pop()

        if problem.isGoalState(node):
            return action

        explored.add(node)
        children = problem.expand(node)

        for child in children:
            if child[0] not in explored and child[0] not in frontier.list:
                frontier.push(child[0])
                result.push(action + [child[1]])
复制代码

如上所述，BFS的实现方式和DFS如出一辙，只是将LIFO的Stack替换为了FIFO的Queue。由于我们普遍习惯利用迭代来实现BFS，所以上述代码看起来更为自然。
相较于DFS，逐层搜索BFS可以保证找到全局最优解。因此，实际运行时可以发现，利用BFS获得的分数要比DFS高一些。但是另一方面，BFS在平均意义下，耗时更长，内存占用也更高。
我们使用了一个Queue来同步追踪frontier的入队及出队情况。对于有类似需求的场景，以上代码可作为模板程序。

一致代价搜索

def uniformCostSearch(problem):
    explored = set()
    frontier = util.PriorityQueue()
    initial = (problem.getStartState(), [], 0)

    frontier.push(initial, 0)

    while True:
        if frontier.isEmpty():
            return []

        (node, result, value) = frontier.pop()

        if problem.isGoalState(node):
            return result

        explored.add(node)
        children = problem.expand(node)

        for child, action, cost in children:
            if child not in explored:
                temp = value + cost
                frontier.push((child, result + [action], temp), temp)
复制代码

由 $Dijkstra$ 提出的一致代价搜索UCS可以理解为等值线意义下的BFS，因为它是依据根点到当前节点的cost进行扩展的。这个cost是真实，确定的，应与后文中我们利用启发式函数得到的评估值进行区分。
既然要依据cost进行节点的出队及子节点的扩展，那么传统的Queue已经无法满足我们的需求了，因此我们使用由小顶堆实现的PriorityQueue，每层都扩展frontier中代价最低的节点。当然，由于优先级队列的数据结构已经在源码中实现了，我们直接调用即可。这里附上PriorityQueue的源码，我个人认为实现得相当精彩。

class PriorityQueue:
    """
      Implements a priority queue data structure. Each inserted item
      has a priority associated with it and the client is usually interested
      in quick retrieval of the lowest-priority item in the queue. This
      data structure allows O(1) access to the lowest-priority item.
    """
    def  __init__(self):
        self.heap = []
        self.count = 0

    def push(self, item, priority):
        entry = (priority, self.count, item)
        heapq.heappush(self.heap, entry)
        self.count += 1

    def pop(self):
        (_, _, item) = heapq.heappop(self.heap)
        return item

    def isEmpty(self):
        return len(self.heap) == 0

    def update(self, item, priority):
        # If item already in priority queue with higher priority, update its priority and rebuild the heap.
        # If item already in priority queue with equal or lower priority, do nothing.
        # If item not in priority queue, do the same thing as self.push.
        for index, (p, c, i) in enumerate(self.heap):
            if i == item:
                if p <= priority:
                    break
                del self.heap[index]
                self.heap.append((priority, c, item))
                heapq.heapify(self.heap)
                break
        else:
            self.push(item, priority)
复制代码

需要注意的是priority和cost是负相关的，cost越低，priority越高，因此在update函数中，倘若我们发现原有的p比新传来的参数priority要低，则证明原有路径更优，因此直接break，不去更新。
回到UCS的代码实现，这次我们push到frontier的元素是一个三元组，如此做的目的当然还是记录路径：

initial = (problem.getStartState(), [], 0)
frontier.push(initial, 0)
复制代码

之所以不采用上文中DFS和BFS的记录方式，是因为除了action，路径代价cost同样需要累加。不过在采用这种记录方法后，就不必调用源码中的update函数了，因为即便state相同，path和cost也不同，所以直接push到优先级队列即可。

A*搜索

def aStarSearch(problem, heuristic=nullHeuristic):
    explored = set()
    frontier = util.PriorityQueue()
    initial = problem.getStartState()
    tot = heuristic(initial, problem)
    frontier.push((initial, [], tot), tot)

    while True:
        if frontier.isEmpty():
            return []

        (node, result, value) = frontier.pop()

        if problem.isGoalState(node):
            return result

        explored.add(node)
        children = problem.expand(node)

        for child, action, cost in children:
            if child not in explored:
                tmp = value + cost + heuristic(child, problem)
                frontier.push((child, result + [action], tmp), tmp)
复制代码

A*搜索是UCS和Greedy Search的结合体 (所谓Greedy Search，就是完全依据启发式进行搜索。该方法无法保证最优先和完备性)。A*算法的代码结构和UCS基本相同，只是额外引入了启发式函数heuristic。笔者首次接触到启发式这个概念，是在大一学习八数码的时候，这是一个相对基本的问题。不过，启发式是无处不在的，就连运筹学的运输规划都会利用启发式来快速获得初始基可行解。优良的启发式函数能够在常数上大幅优化原算法，加快搜索速度。不过即便如此，A*算法仍旧是指数复杂度的，只是在状态空间庞大的问题中，它会比前文中的几种朴素搜索算法快捷得多。
倘若我们不调用heuristic，那么这里的A*算法就退化成了UCS。因此，我们需要自行设计启发式函数，对于A*算法而言，启发式函数需要满足两条性质：
- Admissibility：对于任一节点而言，启发式函数所得的估计值，应当 $\leq$ 该点到达终止状态的真实路径代价，即 $heuristic\quad cost \leq actual\quad cost$ ；
- Consistency：其英文解释见下，相当于三角不等式。
for every node $n$ and every successor $n'$ of $n$ generated by any action $a$ , el costo estimado de alcanzar la meta desde $norte$ noesmayor que el costo del paso para llegar a $norte'$ más el costo estimado de alcanzar la meta desde $norte'$
Bajo la premisa de no exceder el verdadero costo del camino, cuanto mayor sea el valor calculado por la función heurística, mejor. Por lo tanto Pac Man, el uso de la distancia hamiltoniana es mejor que la distancia euclidiana.

def yourHeuristic(position, problem, info={}):
    goal = problem.goal
    return abs(position[0] - goal[0]) + abs(position[1] - goal[1])
复制代码

En este punto, Pac Manse completa la tarea de la tarea. Actualmente, resolvemos un problema simple de búsqueda de rutas utilizando cuatro algoritmos de búsqueda. En la tarea 2, tendremos una mayor comprensión de la función de costo y el código fuente del proyecto, y en la tarea 3, estaremos expuestos a los problemas del juego, utilizando Minimaxalgoritmos, Alpha-Betapoda y funciones de evaluación heurística para lograr un Pac-Man mini verdaderamente inteligente. juego.