"Snake" AI Algorithm Simple Implementation (Mid-Autumn Festival Special Edition)

Preface

Vertical-1276389470-64e48088aa123

The annual Mid-Autumn Festival is coming soon, and the platform also has various Mid-Autumn posting activities. I happened to see this article while browsing: "Rabbit Cake Battle" : Eat mooncakes, see the moon, and bite yourself. ? | Special edition to celebrate the Mid-Autumn Festival - Nuggets (juejin.cn)

Everyone must be familiar with it. The core of this game is the greedy snake. The author has also made a lot of adjustments to the gameplay, which is full of fun. At the same time, elements such as rabbit cakes and the moon are added to increase the festive atmosphere. It can be said that it is full of creativity.

So I wondered if I could innovate in other aspects for this game, such as exploring and improving the Snake AI algorithm . The following is the final effect of algorithm training.

6caf23a2-8b9b-4a69-8b39-1b1bd224da55

Note: The blogger's ability is limited, so this article eliminates the difficulty increase, added elements and other game settings added in the above article, and only retains the most basic game structure of Snake, which may be adjusted later.

deep reinforcement learning

Preface

We can use Deep Reinforcement Learning (Deep RL) algorithm, or we can use Bayesian Optimization (Bayesian Optimization) to optimize the deep reinforcement learning algorithm

Reinforcement learning: A branch of machine learning. Compared with the classic supervised learning and unsupervised learning problems of machine learning, the biggest feature of reinforcement learning is learning from interaction . The Agent continuously learns knowledge based on the rewards or punishments received during the interaction with the environment, and becomes more adaptable to the environment. The paradigm of RL learning is very similar to the process of our humans learning knowledge. Therefore, RL is regarded as an important way to achieve general AI.

Bayesian Optimization: A method for optimizing a black-box function that incrementally improves the model by selecting points in the search space that are most likely to contain the globally optimal solution. The core idea is to combine Bayesian statistics and Gaussian process regression

The AlphaGo we are familiar with is trained based on deep reinforcement learning. The core process is to use Monte Carlo tree search (Monte Carlo tree search), with the help of two depths : value network (value network) and policy network (policy network). The neural network evaluates a large number of selected points through the valuation network, and selects the placement point through the chess moving network. The remaining algorithm applications include advertising recommendation, dialogue system, robot engineering, etc., which will not be discussed here.

The most commonly used representation for formally defining reinforcement learning problems is a Markov decision process . This leads to our final game AI algorithm - Deep Q-Learning (DQN) algorithm. DQN is a specific implementation of deep reinforcement learning.

game definition

snake

The snake is a line that increases the score as well as the length of the snake when eating food . When the snake hits a wall or hits itself , the game ends . The score is the amount of food the player eats , because the size of the snake will increase with the increase of the food eaten, so the game will become more and more difficult in the later stages. The player's goal is to let the snake eat without ending the game. As much food as possible.

environment and state space

The environment of the snake game can n×nbe represented by a matrix. Each in the matrix cellcan be l×lrepresented by , for which the dimension of the state space is s∈(n×n×l×l). In order to keep the game simple, we directly ignore l×lit, and for which the state space can be s∈n×nrepresented (still exponential growth).

action space

For snakes, only four actions can be taken : up, down, left, right.

To speed up training and reduce backward collisions, the action can be simplified as: straight, clockwise turn, counter-clockwise turn.

Representing actions this way is beneficial because when the agent "explores" and randomly selects an action, it does not change itself 180 degrees.

Positive and negative rewards

The main reward of the game is when the snake eats food and increases its score. As a result, rewards are tied directly to the game's final score, similar to how humans judge rewards.

With other positive rewards, the agent might loop infinitely , or learn to avoid food altogether to minimize snake length.

In addition, negative rewards have been added to allow the snake to get more information about its status: ** Collision detection (with itself or walls), looping (infinite loops are discouraged), empty cell, and close/mid/far/very_far from food (Encouragement leads to food)**.

DQN network definition

Define a network with an input layer size of 11 to define the snake's current state, a hidden layer size of 256 nodes, and an output layer size of 3 to determine what action to take. The figure below is a visual representation of the network.

Since the game is in discrete time steps (frames), a new state can be calculated for each new frame of the game.

Define the state parameters as 11 Boolean values ​​based on the snake's direction of movement, the dangerous position as a possible collision in the next frame, and the food position relative to the snake.

3 actions are the direction in which the snake moves relative to the direction it is facing: forward, left, right. Note that the output of the network here is the Q estimate of the 3 actions.

The state at each time step is passed to the Q-Learning network, which makes predictions about what it considers the best action . This information is held in both short-term and long-term memory. All information learned from previous states can be extracted from memory and passed to the network to continue the training process.

NN

ablation experiment

Robert Long's definition of ablation research (or ablation experiment) : usually used in neural networks, especially relatively complex neural networks, such as R-CNN. Learn about your network by removing parts of it and studying its performance .

The original meaning of "ablation" is the surgical removal of body tissue. The term "ablation study" has its roots in the field of experimental psychology in the 1960s and 1970s , in which parts of an animal's brain were removed to study its effects on their behavior . In the context of machine learning, and in particular complex deep neural networks, "ablation studies" have been employed to describe the process of removing certain parts of a network in order to better understand the network's behavior .

It can be simply understood as the control variable method .

Reference project

maurock/snake-ga: AI Agent that learns how to play Snake with Deep Q-Learning (github.com)

sourenaKhanzadeh/snakeAi: Reinforcement Learning with the classic snake game (github.com)

Both projects use the DQN algorithm , but there are some problems with the installation dependencies of the first warehouse, and the operating systems are also different, so the second warehouse is used (if you are interested, you can also try the first project)

# 创建环境
conda create -n snk-ai-py3.7 python=3.7
# 拉取代码
git clone [email protected]:sourenaKhanzadeh/snakeAi.git
# 安装依赖
pip install -r requirements.txt

Finally run the code

python python main.py

Note: The server used by the blogger does not have a desktop and uses SSH connection directly. For how to use SSH connection to transfer GUI images, you can refer to the blogger's previous articles (VSCode "SSH" connection server "GUI interface" transfer)

genetic algorithm

Reference project: Ackeraa/snake: Snake AI with Genetic algorithm and Neural network (github.com)

Note: Parent reference project: Chrispresso/SnakeAI (github.com)

There is not much difference between the two projects. You can use the parent reference project directly. This time, the sub-project is used for demonstration. The principle is to use genetic algorithms and simple neural networks to implement the snake pathfinding algorithm.

quick start

git clone [email protected]:Ackeraa/snake.git
pip install -r requirements.txt
# 无画面训练(推荐)
python main.py
# 有画面训练
python main.py -s

You can directly adjust settings.pythe parameters in it, which FPScan be increased during training to make training faster, and decreased during display for easier display.

# 训练
FPS = 1000
# 展示
FPS = 8

Here we do not waste time training from scratch, but directly use the trained genes for easy display.

python main.py -i -s

If you want to train from scratch, you need to execute the following command to delete the weight file and then train again.

rm -rf genes/best/*
rm -rf genes/best/*
rm -rf seed/*

Effect demonstration

6caf23a2-8b9b-4a69-8b39-1b1bd224da55

Final addition

The author has limited abilities and cannot explain all algorithms. Here are some warehouses (available for personal testing), so you can explore on your own.

Hamilton/greedy

Warehouse address: chuyangliu/snake: Artificial intelligence for the Snake game. (github.com)

Warehouse introduction: Mainly using Hamilton (Hamilton algorithm), Greedy (greedy algorithm), DQN algorithm (experimental function), while achieving user-friendly GUI and simplicity of algorithm implementation

Multilayer perceptron/convolutional neural network

Warehouse address: snake-ai/README_CN.md at master · linyiLYi/snake-ai (github.com)

Warehouse introduction: This project was developed by Mr. Lin Yi. There is also an introduction video on station B. Interested friends can jump here . This project is the program script of the classic game "Snake" and the artificial intelligence that can automatically play the game. Intelligent agent. The intelligent agent is trained based on deep reinforcement learning and includes two versions: an agent based on a multi-layer perceptron and an agent based on a convolutional neural network, with the latter having a higher average game score. high.

Note: This project by Mr. Lin Yi is suitable for Windows and Mac systems, please pay attention to use

Neural Network + Genetic Algorithm

仓库地址:greerviau/SnakeAI: Train a Neural Network to play Snake using a Genetic Algorithm (github.com)

Reference link

greerviau/SnakeAI: Train a Neural Network to play Snake using a Genetic Algorithm (github.com)

snake-ai/README_CN.md at master · linyiLYi/snake-ai (github.com)

chuyangliu/snake: Artificial intelligence for the Snake game. (github.com)

maurock/snake-ga: AI Agent that learns how to play Snake with Deep Q-Learning (github.com)

Deep Learning and CV Tutorial (17) | Deep Reinforcement Learning (Markov Decision Process, Q-Learning, DQN) (showmeai.tech)

[Reinforcement Learning] Implementing Snake based on DQN (pytorch version) - Zhihu (zhihu.com)

Guess you like

Origin blog.csdn.net/m0_63748493/article/details/133093041