How can I build robust MARL agents with intermittent observations?
RMIO: A Model-Based MARL Framework for Scenarios with Observation Loss in Some Agents
This paper introduces RMIO, a novel model-based multi-agent reinforcement learning (MARL) framework designed to handle scenarios where some agents experience temporary observation loss. It uses a world model to predict missing observations, a correction block to refine these predictions using information from other agents, and a communication mechanism for state synchronization only when observation loss occurs. This allows agents to maintain stable decision-making even with incomplete information. RMIO also improves asymptotic performance using reward smoothing and a dual experience replay buffer.
Key points for LLM-based multi-agent systems: RMIO's world model can be envisioned as analogous to an LLM generating predictions based on partial or noisy information from other agents. The correction block's ability to refine predictions using information from other agents demonstrates a collaborative approach to knowledge integration within a multi-agent system. The communication mechanism highlights the potential for controlled information exchange to improve robustness in LLM-based agents facing information scarcity or uncertainty. Reward smoothing and experience replay could also be beneficial for training LLM-based agents in complex, dynamic multi-agent environments.