Can decentralized MADDPG improve multi-agent LLM apps?
Fully-Decentralized MADDPG with Networked Agents
This paper explores decentralized training for multi-agent reinforcement learning (MARL) with continuous action spaces, specifically adapting the MADDPG algorithm. It introduces "surrogate policies" where each agent models the behavior of others using only local observations, promoting decentralized training. Two variations using networked communication are also presented, sharing critic parameters through a network using "hard" and "soft" consensus updates.
Key points for LLM-based multi-agent systems: The concept of surrogate policies, where agents model each other's behavior based on limited local information, directly relates to how LLMs could operate in a multi-agent environment. Decentralized training, crucial for scaling multi-agent LLM systems, is central to this research. The communication strategies (hard and soft consensus updates) could inspire methods for LLMs to share knowledge and coordinate in a decentralized fashion. The focus on partially observable environments is especially relevant to LLM agents which may have different access to information or perspectives on a shared situation.