Summary of multi-agent reinforcement learning theory and algorithm

Summary of multi-agent reinforcement learning theory and algorithm

First understand on-policy and off-policy [Reinforcement Learning] One article to understand, my understanding of
on-policy and off-policy
: on-policy is to use the latest strategy to perform actions to collect data, while the training data of off-policy is not A collection of the latest strategies. On-policy also uses the same policy network to sample actions and execute them. This policy network is also needed to update the Q value.

1. Basic conceptual understanding, multi-agent status, observation, rewards, etc. need to be redefined:
reinforcement learning - multi-agent reinforcement learning
reinforcement learning notes: Policy-based Approach

2. Summary of the latest multi-agent reinforcement learning methods:
The latest multi-agent reinforcement learning methods [Summary]
Here is a good explanation of Transformer:
Transformer explanation

3. Some basic algorithms (PPO, DQN, etc.) steps
DQN - PPO process summary
reinforcement learning notes: PPO [Proximal Policy Optimization (Proximal Policy Optimization)]
Advanced - PPO code line-by-line analysis

4. Definition of TD, GAE advantage function, etc.:
GAE generalized advantage estimate

5. Summary of the limitations of some classic multi-agent algorithms:
[1] The latest multi-agent reinforcement learning method [Summary]

6. Collaborative (cooperative) multi-agent algorithms that need to be mastered:
1. Method based on value function value decomposition:
(1) COMA algorithm:
[COMA] A multi-agent algorithm that splits team rewards into independent rewards
COMA Algorithm Analysis: Counterfactual Multi-Agent Policy Gradients

(2) Introduction and limitations of VDN/QMIX/QTRAN/Qatten algorithm:
Summary of multi-agent deep reinforcement learning value decomposition method (1) - VDN/QMIX/QTRAN/Qatten

(3) QMIX algorithm: Introduction
to multi-agent reinforcement learning Qmix
code: https://blog.csdn.net/tianjuewudi/article/details/121005721

(4) QTRAN algorithm:
Detailed explanation of QTRAN algorithm (upgraded versions of VDN and QMIX)

(5) Qatten algorithm:
Qatten

(6) MAVEN algorithm:

(7) Weighted QMIX algorithm:
from QMIX to WQMIX - Detailed explanation of Weighted QMIX algorithm

(8) QPLEX algorithm:
Multi-agent reinforcement learning 2021 paper (5) QPLEX

Finally, there is a big boss research:
multi-agent reinforcement learning value function decomposition paper research on
multi-agent reinforcement learning value function decomposition: analysis of the advantages and disadvantages of VDN, QMIX, QTRAN series

2. PPO-based method:
(1) MAPPO
multi-agent reinforcement learning MAPPO theory interpretation

Insert image description here
(2) HAPPO

(3) MAT

3. Good code:
https://github.com/marlbenchmark/on-policy
https://github.com/hijkzzz/pymarl2

4. Summary of some papers
https://www.zhihu.com/people/sanmuyansan-mu-yang/columns

5. Carla, a simulation software for multi-agent reinforcement learning for autonomous driving
(not recommended, not lightweight enough)
metadrvie: https://github.com/metadriverse/metadrive
smart: https://github.com/huawei-noah/SMARTS

Guess you like

Origin blog.csdn.net/weixin_39735688/article/details/131260791