How to stabilize multi-agent learning in non-stationary environments?
XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity
September 19, 2024
https://arxiv.org/pdf/2409.11852-
This paper tackles the problem of non-stationarity in multi-agent reinforcement learning (MARL), where agents struggle to learn effectively because other agents are constantly changing their policies.
-
The paper introduces XP-MARL, a framework that prioritizes agents, letting higher-priority agents act first and communicate their actions. This action propagation helps stabilize the learning environment for lower-priority agents. Instead of fixed priorities, XP-MARL learns which agent should be prioritized in a given situation. This is particularly relevant to LLM-based multi-agent systems, offering a potential mechanism for managing complex interactions and improving coordination between LLM agents.