How to best order agent actions in MARL?
PMAT: Optimizing Action Generation Order in Multi-Agent Reinforcement Learning
February 25, 2025
https://arxiv.org/pdf/2502.16496This paper introduces PMAT (Prioritized Multi-Agent Transformer), a new algorithm that improves the coordination of multiple AI agents in tasks requiring collaboration. It addresses the challenge of determining the optimal order for agents to make decisions, which is crucial for efficient teamwork.
Key points for LLM-based multi-agent systems:
- Sequential decision-making: PMAT uses a sequential approach, where agents make decisions one after another, enabling each agent to consider the actions of preceding agents. This is particularly relevant to LLMs, which naturally generate text sequentially.
- Action generation order optimization: PMAT optimizes the order in which agents act using Plackett-Luce sampling. This allows agents with the most relevant information at a given time to act first, improving overall coordination. This is analogous to deciding which LLM agent should respond first in a multi-agent conversation.
- Integration with Transformers: PMAT builds upon the Multi-Agent Transformer (MAT) architecture, showcasing how order optimization can enhance transformer-based multi-agent systems. This is directly relevant to current LLM development trends.
- Improved coordination and performance: Experiments demonstrate that PMAT leads to more efficient teamwork and better performance in various multi-agent tasks, which is a key goal in LLM-based multi-agent system development.