Play against artificial intelligence! myCobot 280 Open Source Six-Axis Robotic Arm Connect 4 Connect 4 Game

Connect 4 myCobot280

Introduction

Hi, guys. Today we will introduce artificial intelligence to play chess. We will use the robotic arm as the opponent to play chess with you.

Research on artificial intelligence playing chess can be traced back to the 1950s. At that time, computer scientists began to explore how to program computers to play chess. The most famous example of this is Deep Blue, developed by IBM, which defeated then-world chess champion Garry Kasparov in 1997 by a score of 3.5-2.5.

Artificial intelligence playing chess is like giving the computer a way of thinking to make it win the game. There are many ways of thinking about this, most of which stem from good algorithms. Deep Blue's core algorithm is based on brute force: generate all possible moves, then perform a search as deep as possible, and constantly evaluate the situation to try to find the best move.

Today I will introduce how an AI robotic arm plays chess.

Connect 4

Connect4 is a strategy board game to be introduced today, also known as Connect Four. The game goal of Connect4 is to first achieve a horizontal, vertical or diagonal arrangement of four consecutive chess pieces in a vertically placed grid of 6 rows and 7 columns. Both players take turns inserting their pieces at the top of the grid, and the pieces fall to the bottommost available slots in the current column. Players can choose to place their pieces in any column, but pieces can only be placed below existing pieces.

As shown in the animation, this is connect4.

myCobot 280

For the robotic arm, I chose myCobot 280 M5Stack, which is a powerful desktop six-axis robotic arm. It uses M5Stack-Basic as the control core and supports development in multiple programming languages. The six-axis structure of Mycobot280 makes it highly flexible and precise, capable of various complex operations and movements. It supports multiple programming languages, including Python, C++, Java, etc., enabling developers to program and control the robotic arm according to their own needs. Its simple operation interface and detailed user manual enable users to get started quickly, and its embedded design makes the robot arm compact and easy to carry and store.

Below is the scene we built.

Use myCobot as an artificial intelligence to play chess with us.

game algorithm

First of all, we have to solve one of the most critical problems, which is what algorithm should be used to play games. In other words, it is to provide a brain capable of thinking for the robotic arm. We will briefly introduce several common game algorithms for you:

Minimax algorithm:

This is a classic game algorithm, suitable for two-player games. It evaluates the score of each possible move by recursively simulating the opponent's and its own moves, and chooses the move with the best score. The minimax algorithm can find the best chess strategy by searching the tree structure of the game. The algorithm is a zero-sum algorithm, that is, one party chooses the option that maximizes its advantage among the available options, and the other party chooses the method that minimizes the opponent's advantage. At the beginning, the sum is 0. A simple example of tic-tac-toe to illustrate.

Max represents us and Min represents the opponent. At this time, we need to give each result a score, which is the Utility here. This score is evaluated from our (that is, Max) point of view. For example, in the above picture, if I win, it will be +1, if I lose, it will be -1, and if it is a draw, it will be 0. So, we want to maximize this score, and the opponent wants to minimize this score. (In the game, this score is called a static value.) Let me say here that tic-tac-toe is a relatively simple game, so all possible outcomes can be listed. However, most games are not likely to list all the results. According to the amount of computer calculation, we may only be able to push forward 7 or 8 steps, so this time the score is not as simple as -1, 1, 0, and there will be a special algorithm to give different scores according to the current results.

Alpha-Beta pruning algorithm:

This is an optimization of the minimax algorithm. It speeds up the search by reducing the number of branches searched by pruning. The Alpha-Beta pruning algorithm uses the upper and lower bounds (Alpha and Beta values) to determine which branches can be discarded, thereby reducing the depth of the search.

Neural Network + Deep Learning:

The game algorithm connect4 we designed also uses neural network + deep learning to solve the game algorithm.

Neural Networks:

Scientists have always hoped to simulate the human brain and create machines that can think. Why can man think? Scientists have discovered that the reason lies in the body's neural network. Neural network is a mathematical model that simulates the structure and function of the human brain nervous system. It performs information processing and learning by simulating the connection and signal transmission between neurons. Neural networks are the beginning of all artificial intelligence.

The basic idea of ​​the neural network algorithm is to pass the input data to the input layer of the network, and then pass through a series of intermediate layers (hidden layers) for calculation and transmission, and finally get the result of the output layer. The training process optimizes the performance of the network by adjusting connection weights to minimize the difference between the actual output and the desired output.

deep learning:

Deep learning is a branch of machine learning that focuses on learning and reasoning using deep neural networks. Deep learning solves complex learning and decision-making problems by building deep neural networks, that is, neural networks with multiple intermediate layers (hidden layers). It can be said that deep learning is a learning method that uses neural networks as a core tool. Deep learning not only includes the structure and algorithm of neural network, but also includes training methods, optimization algorithms and large-scale data processing.

project build

The project is mainly divided into two parts, hardware and software:

The most important part of the project is information collection and information analysis and processing.

The relevant knowledge of neural algorithms and deep learning is also mentioned earlier. We use the DQN neural network.

DQN neural network

The DQN neural network was proposed by DeepMind, which combines the ideas of deep learning and reinforcement learning. DQN uses a deep neural network to estimate the state-action value function (Q function) to achieve optimal decision-making in complex environments. The core idea of ​​DQN is to use a deep neural network as a function approximator to approximate the state-action value function. By taking the current state as input, the neural network outputs the corresponding Q value of each action, that is, predicts the long-term reward of the action in the current state. Then, according to the Q value, the optimal action is selected for execution.

Environment build

First of all, we need to define the game Connect4, using a two-dimensional array to represent the game board, two colors of chess pieces, red R, yellow Y. Then define the conditions for the end of the game, when four chess pieces of the same color are connected in a line, the game will be exited.

#Define a 6*7 chessboard
self.bgr_data_grid = [[None for j in range(6)] for i in range(7)]

#Used to display the state of the board
def debug_display_chess_console(self):
    for y in range(6):
        for x in range(7):
            cell = self.stable_grid[x][y]
            if cell == Board.P_RED:
                print(Board.DISPLAY_R, end="")
            elif cell == Board.P_YELLOW:
                print(Board.DISPLAY_Y, end="")
            else:
                print(Board.DISPLAY_EMPTY, end="")
        print()
    print()

copy

Here is the code that defines whether the game is over:

def is_game_over(board):
    # Check if there are four consecutive identical pieces in a row.
    for row in board:
        for col in range(len(row) - 3):
            if row[col] != 0 and row[col] == row[col+1] == row[col+2] == row[col+3]:
                return True

    # Check if there are four consecutive identical pieces in a column.
    for col in range(len(board[0])):
        for row in range(len(board) - 3):
            if board[row][col] != 0 and board[row][col] == board[row+1][col] == board[row+2][col] == board[row+3][col]:
                return True

    # Examine if there are four consecutive identical pieces in a diagonal line.
    for row in range(len(board) - 3):
        for col in range(len(board[0]) - 3):
            if board[row][col] != 0 and board[row][col] == board[row+1][col+1] == board[row+2][col+2] == board[row+3][col+3]:
                return True

    for row in range(len(board) - 3):
        for col in range(3, len(board[0])):
            if board[row][col] != 0 and board[row][col] == board[row+1][col-1] == board[row+2][col-2] == board[row+3][col-3]:
                return True

    # Verify if the game board is filled completely.
    for row in board:
        if 0 in row:
            return False

    return True

copy

Build a DQN neural network

Define the input layer and output layer of the neural network, where the dimensions of the input layer should match the state representation of the game board, and the dimensions of the output layer should match the number of legal moves. In short, the input layer receives the state information of the game board, and the output layer generates the corresponding action selection.

experience cache

Machines need to learn, and we need to build an experience buffer to store the experience of the agent. This can be a list or a queue, used to store information such as states, actions, rewards and next states during the game.

The following is the pseudocode for building the experience cache:

class ReplayBuffer:
    def __init__(self, capacity):
        self.capacity = capacity
        self.buffer = []

    def add_experience(self, experience):
        if len(self.buffer) >= self.capacity:
            self.buffer.pop(0)
        self.buffer.append(experience)

    def sample_batch(self, batch_size):
        batch = random.sample(self.buffer, batch_size)
        states, actions, rewards, next_states, dones = zip(*batch)
        return states, actions, rewards, next_states, dones

copy

decision making

We define a strategy class named EpsilonGreedyStrategy, which uses ε-greedy strategy for action selection and exploration. In the initialization function __init__(), we specify the exploration rate ε. The select_action() method selects an action according to the Q value, randomly selects the action according to the probability of the exploration rate or selects the action with the highest Q value.

class EpsilonGreedyStrategy:
    def __init__(self, epsilon):
        self.epsilon = epsilon

    def select_action(self, q_values):
        if random.random() < self.epsilon:
            action = random.randint(0, len(q_values) - 1)
        else:
            action = max(enumerate(q_values), key=lambda x: x[1])[0]
        return action

copy

training framework

Use python's PyTorch framework to build training and loop training. Regularly use the current DQN neural network for game evaluation against pre-trained or other opponents to evaluate the performance of the agent. until the preset requirements are met.

link:https://twitter.com/i/status/1651528699945291776

Summarize:

The content of this article has come to an end. This article mainly introduces how the DQN neural algorithm is implemented in Connect4. The next article will introduce how the robotic arm executes according to the optimal solution obtained. The introduction of the algorithm in this article is just the tip of the iceberg. If you are interested in playing chess algorithms, you can refer to related books for further understanding.

Now we are in the era of great changes. Artificial intelligence is everywhere, not only in chess, but also in various fields. We have to seize the times and seize the opportunity to keep up with the 21st century full of technology.

We will update the next article soon, if you are interested, please follow us, leaving a message below is the best support for us!

Guess you like

Origin blog.csdn.net/m0_71627844/article/details/131511192