Games104 Modern Game Engine Notes Advanced AI

Hierarchical Task Network Hierarchical Task Network

Insert image description here
Insert image description here
World State is a subjective perception of the world, not a description of the real world
Sensors are responsible for capturing various states from the game environment
Insert image description here
HTN Domain stores the hierarchical tree structure Task and the association relationship 1
Planner plans tasks from Domain according to World State
Plan Runner based The plan set by the Planner executes the Task. When many other problems occur during the execution of the Task, the Plan Runner will monitor all statuses and inform the Planner to plan another series of Tasks (Re-plan)
Insert image description here
primitive: Single action
compound: compound task
Insert image description here
Insert image description here
preconditions: detect which state conditions in the world state are met before executing, otherwise return false,
Check whether the task fails during execution (read operation to the world)
effects: modify the world state after the task is executed (write operation to the world)
Insert image description here
Insert image description here
task It consists of a bunch of Methods. Each method has its own set of Preconditions. The Method is the priority from top to bottom, similar to the BT selector function.
Each method must execute a series of task, all completed, similar to the Sequence of BT selector.
Insert image description here
Insert image description here
HTN Domain needs to define a Root Task as the root node, which is the core node.
Insert image description here
Insert image description here
According to the World State, select the current target task starting from the Root Task and expand it in sequence< /span> 2. Plan The link is very long, and in a highly uncertain environment, it is easy to fail in the middle, leading to replan 1. When configuring Precondition and Effector, due to the large number and overlap, the Task may not be executed, but it is difficult for the designer to find it: some static inspection tools are needed to check for logical loopholes Disadvantages: 3. Execution efficiency is higher than BT (Planning reduces the process of traversing the tree, ai in When the tick is awakened, it must be run again from the root. htn Unless the world changes, or the plan is completed or invalidated, it will not be re-traversed) 2. Helps designers plan long-term behavior 1. It is a very good solution for BT Abstraction and summary Advantages: Replan Reasons: 1. No plan 2. The plan has been executed or failed 3. The perceived world state has changed Execute the plan sequentially according to the results of the Planner. The execution may fail and require Replan Finally, it is equivalent to searching the domain and outputting a string of Primitive Tasks If the Precondition does not meet the plan, it can only return False and return all the way until the Root Task Replaning is to handle the situation if the Task is unsuccessful
Insert image description here
The process of Plan expansion is very fast, but because each Primitive Task has an Effect that modifies the world state, it does not actually modify the World State. However, this may It will affect subsequent tasks. So the method here is to copy World State, then modify and deduce this copy, and start planning step by step. It is equivalent to conducting a preview of World State and assuming that all Tasks will succeed.

Insert image description here

Insert image description here
Insert image description here

Insert image description here
Insert image description here

Insert image description here

Insert image description here






Goal-Oriented Action Planning

Insert image description here
Goal set: All achievable goals. The goals are not explicitly defined in Htn. The goals are seen from the task tree (written in comments). The goals in GOAL can all be expressed mathematically
Planning: planning problem
Insert image description here
Each goal is the state of the world expected to be achieved after a series of actions are completed. to express quantitatively. Each goal needs to satisfy some World State (usually bool value) after the action is completed. It is not a single state, but a combination of goal states (Collection of States)
Insert image description here
Insert image description here
Compared with Primitive Task, Cost increased. A designer defines cost, which is used for dynamic programming
Insert image description here
Plan each action backwards
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Compare the current target world state with the external world state to find out what is not satisfied The state is added to the stack of Unsatisfied State.
Compare the first unsatisfied state to find in the action set which action output effect can overwrite the unsatisfied state and move it Except the rewriteable state in the stack
If the precondition of aciton is not satisfied, the unsatisfied state is put forward and reversely added to the stack of Unsatisfied state, and the action is added to the plan stack< a i=8> The final goal is to clear the Unsatisfied state stack, and it is expected that the action cost in the plan stack is the least

Insert image description here

node: a combination of state
egde: all possible actions
distance: the consumption required by the action
Insert image description here
a*heuristic function: choose a path closer to the current state combination
Insert image description here
Advantages:
1. Compared with HTN, GOAP is more dynamic< a i=6> 2. Separate goals and behaviors in a real sense (FSM, BT, HTN, behaviors and goals are locked one by one, GOAP may have multiple behavioral routes for the same goal, which can exceed the imagination of the designer) 3. Can avoid deadlock and other problems in HTN configuration Disadvantages: 1. Very complex and computationally intensive Larger than HTN, BT and FSM 2. GOAP needs a quantitative expression of Game World. It is difficult to express World State through bool variables in complex games Usually For traditional stand-alone games, 1V1 or a small amount of AI games are more suitable





Monte CarloTree Search

Insert image description here
Insert image description here
Insert image description here

Monte Carlo is an algorithm based on random sampling
State and Action are methods used to abstract Go problems into mathematical problems
State : The current state of the world, such as the positions of all pieces at this moment
Action: executable actions, that is, moves
Insert image description here
Judge all possibilities and choose the most advantageous How to do it
Insert image description here
Insert image description here
Q Number of simulation wins
N Number of simulations
Q / N Judge the quality of State< a i=8> The simulation results need to be propagated in reverse to update the parent node1. Select the most likely child node where all possibilities have not been fully expanded 2. Expand and add a new exploration 3. Do a simulation, simulate the outcome, and determine the quality of the exploration direction 4. Get the results Then, transfer the value back to the parent node
Insert image description here

Insert image description here



Insert image description here
Insert image description here

Expandable Node The possibilities of this node have not been exhausted
Insert image description here
Exploitation development: Prioritize finding points with a high winning rate, that is, choose Nodes with large N and Q/N values
Exploration: Prioritize exploration with a smaller N value
Insert image description here
UCB algorithm: Algorithm used to balance development and exploration
Priority Select the one with higher Q/N, and then compare it with the parent node N.
C is used to adjust the strategic balance. The larger the C value, the more aggressive it is (tends to explore), and the smaller the C value, the more conservative it is. (Tend to development)
Insert image description here
Starting from the root, compare the UCB values ​​of all child nodes in the first circle, and use the maximum value node as the next exploration direction, and go all the way down to the first Expandable Node node ( has not been fully expanded), it is expanded as the current selected node
Similar to BFS, but it starts from the root and goes down every time
Insert image description here
Depending on the performance, one or Multiple possibilities
Insert image description here
Each node simulates the outcome of victory and defeat and updates the parent node in reverse (the Q and N of each Node are superimposed in sequence)
Insert image description here
Set a search The number of times or memory size or calculation time is used as the stopping condition
Insert image description here
. After stopping, a Tree
is obtained. In the first child node, choose according to different strategies:
Max Child: Select the one with the largest Q value, that is, the one with the most wins.
Robust Child: Select the one with the most visited child nodes, the one with the largest N value, not Q/N
Max-Robust Child: The largest Q and N, if not, continue running until it appears
Secure Child: LCB Lower Confidence Bound, (consider The lower execution interval) is mainly a penalty for the choice of a small number of samples, or it is a problem with the setting of C
Insert image description here
Advantages:
1. It will make the AI ​​more flexible ( There are random numbers)
2. AI’s decision-making is self-action rather than passive behavior, which is beyond the designer’s imagination
Disadvantages:
1 .In complex games, it is difficult to define victory or defeat, and the impact of decision-making on victory or defeat
2. The calculation complexity is very high

MCTS is not suitable for all games, it is suitable for Turn-base (you and me) and each action has a clear output result (turn-based combat game, output a skill, which can accurately calculate the opponent's blood loss, what will change state) game, it can also exist as a subsystem in combination with other methods.

Machine Learning Basic

Insert image description here
Insert image description here
The essence is a classifier, such as image recognition
Insert image description here
The essence is clustering, such as user portrait construction
Insert image description here
It reduces the input of case unlabeled data, mainly It is the direction of small sample learning
Insert image description here
There is no supervision and no judgment mechanism to tell right or wrong. Let AI self-optimize and iterate to form its own strategy through rewards
Insert image description here
It is essentially a trial-and-error search. One of the more difficult things is that the reward is Delayed Reward. The reward for the mouse when it reaches the end is not delayed, and the reward and punishment mechanism is not triggered every time it takes a step
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here

Markov decision process

When I am in the State s of the current strategy and I take an Action a, the number of possibilities to reach a new state is a random variable.

Pollcy: The strategy black box is also the core of the AI ​​system. When a state is input, it outputs the probabilities of all possible actions. It is also the core of various model optimizations.

Total reward: γ is used to balance short-term benefits and long-term benefits. The reward obtained after each step of operation is multiplied by γ to adjust the probability.

Insert image description here

Build Advanced Game AI

Insert image description here
In the past, algorithms were designed by humans and would not exceed human expectations. Machine learning gives unlimited possibilities for game behavior
Insert image description here
The focus is on the construction of game Observation, and also It is to quantitatively describe the game state, and then repeatedly optimize the Policy
Insert image description here
state: description of the world state
action: what the computer AI wants to direct the game to do
reward: Set rewards for actions, the simplest is the judgment of victory or defeat
NN design: Build the topology of the neural network
Training Strategy: Training Strategy
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Looking from the bottom up
1. Through various in-game data input, Scalar Features, Entities, Minimap, etc.
2. Through different neural network types, MLP, Transformer and ResNet
3. Integrate all results into LSTM
4. The results are Unreadable, Complete Encode
5. Decode is to translate the encoded results into something that humans can understand
Insert image description here
Multi-layer neural network: processing fixed-length data
Insert image description here
Convolutional neural network: Process various images
Insert image description here
Process a large amount of data of variable length
Insert image description here
Simulate feedback and memory, and strategies used multiple times will be memorized , and memory is being lostInsert image description here
Insert image description here
For complex games, training cannot be started directly from scratch, because the convergence speed will be very slow. First, use human data to train a basic, relatively good model. Start with Supervised learning
KL Divergence numerical difference entropy (the difference between two distributions will form an entropy), used to measure the distance between two probability distributions. The smaller the difference entropy represents the two The more similar the probability distribution is, it is used to measure the extent to which AI has learned human operations
Insert image description here
Insert image description here
MA Main Agents: Play 35% with yourself every day, then play 50% with LE and ME, and finally play with the past MA Hit 15%
LE League Exploiters: Specially looking for the weaknesses of all Agents
ME Main Exploiters: Specially looking for the shortcomings of the main branch MA
Although the AI ​​that has been training alone will become stronger and stronger, it will lead to specialization of capabilities (overfitting)
Insert image description here
When there is a large amount of player data, it is recommended to first use SL (supervised learning). Because it can converge quickly, if the amount of data is large enough and good enough, ai can reach a good level
Insert image description here
The upper limit of reinforcement learning is very high, but the training is very complicated and the cost is very highInsert image description here

Insert image description here
If the rewards are dense enough, the reward results can be judged in every step or in a few clicks. Using reinforcement learning, it is easy to train a good AI. If it is an exploration and puzzle-solving type, an action and the result are very unrelated, and the effect of reinforcement learning is relatively good. DisasterInsert image description here

Guess you like

Origin blog.csdn.net/Mhypnos/article/details/133612440