How can I scale MARL for large-team competitive games?
Learning Large-Scale Competitive Team Behaviors with Mean-Field Interactions
This paper introduces MF-MAPPO, a new algorithm for training large teams of AI agents that compete and cooperate in simulated environments. It addresses the scalability challenges of existing methods by leveraging mean-field theory, which approximates agent interactions using probability distributions (mean-fields) instead of tracking individual agents. This approach significantly reduces computational complexity. Key points for LLM-based multi-agent systems include the use of shared "minimally-informed" critic networks that only receive mean-field information as input, independent of agent states and actions. This design promotes scalability, as critic network size remains constant regardless of the number of agents. Additionally, the algorithm trains competing teams simultaneously, leading to more dynamic adaptation compared to alternating training schemes. This is relevant for LLM-based agent systems where efficient training and dynamic interaction are critical.