How can I improve multi-agent RL exploration?
Optimistic e-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning
This paper addresses the problem of underestimation of optimal actions in cooperative multi-agent reinforcement learning, particularly within the Centralized Training with Decentralized Execution (CTDE) paradigm. It proposes "Optimistic ε-Greedy Exploration," a new exploration strategy that uses an optimistic network to identify and preferentially sample optimal actions during training, improving the accuracy of learned value functions.
For LLM-based multi-agent systems, the key takeaway is the improved exploration strategy. By incorporating an optimistic estimation alongside the main value estimation, agents can better explore the action space and avoid getting stuck in suboptimal solutions, which is especially relevant when dealing with complex interactions and the potential for relative overgeneralization in multi-agent scenarios. This suggests that similar optimistic exploration approaches could be beneficial for LLM agents collaborating in a shared environment, potentially leading to more effective communication and task completion.