How train agents centrally, act decentrally?
An Introduction to Centralized Training for Decentralized Execution in Cooperative Multi-Agent Reinforcement Learning
This paper explores Centralized Training for Decentralized Execution (CTDE) in cooperative Multi-Agent Reinforcement Learning (MARL). CTDE allows agents to leverage shared information during training, leading to better coordination, while still acting independently during execution. This is particularly relevant to LLM-based multi-agent systems where LLMs can be used as agents. The paper dives into two main CTDE methods: value function factorization (like VDN, QMIX, and QPLEX) where a joint value function is broken down into individual agent values, and centralized critic methods (like MADDPG, COMA, and MAPPO) which employ a central critic to guide the learning of decentralized actors. The use of state information in critics and the tradeoffs between different types of critics are also analyzed. The paper concludes with a discussion on other CTDE forms, highlighting areas for future research, such as developing globally optimal model-free MARL methods.