How to improve multi-agent exploration with consensus guidance?
CPEG: Leveraging Consistency Policy with Consensus Guidance for Multi-agent Exploration
This paper introduces CPEG, a new method for improving exploration in cooperative multi-agent reinforcement learning (MARL). It addresses the challenge of agents getting stuck in suboptimal solutions, especially in environments with sparse rewards.
CPEG uses a "consistency policy" which is a faster, single-step version of diffusion models, allowing for multimodal action exploration. It also introduces a "consensus learner" that helps agents cooperate by inferring a shared understanding of the global state from their individual observations. This shared understanding, represented as a discrete code, guides the consistency policy. A self-reference mechanism is also incorporated to help avoid generating nonsensical actions early in training. These components are relevant to LLM-based multi-agent systems as they tackle exploration and cooperation, key challenges in this emerging field. The discrete consensus representation could be especially relevant for LLMs which excel at discrete token manipulation.