Can decentralized agents learn shared value functions?
Distributed Value Decomposition Networks with Networked Agents
This paper introduces Distributed Value Decomposition Networks (DVDN), a novel algorithm for training cooperative multi-agent reinforcement learning (MARL) systems in environments with partial observability and decentralized communication. Instead of relying on centralized training like traditional Value Decomposition Networks (VDN), DVDN allows agents to learn locally by exchanging information about their individual learned values (temporal differences) with their neighbors. This approach makes it suitable for real-world scenarios where central control isn't feasible. For homogeneous agents (identical capabilities), the algorithm incorporates gradient tracking to enhance learning by encouraging agents to agree on a shared model.
Key points for LLM-based multi-agent systems: DVDN enables decentralized training of cooperative LLM agents, allowing them to learn individual policies while still working towards a shared goal. This is especially relevant for LLM agents interacting in dynamic environments where centralized control is impractical. The use of local communication and gradient tracking can potentially improve efficiency and scalability compared to centralized training methods. Furthermore, the ability to handle partially observable environments is crucial for real-world applications where agents may not have complete information. DVDN offers a potential pathway for developing collaborative and communicative LLM-based agents in diverse web applications.