How can LLMs improve multi-agent credit assignment?
Leveraging Large Language Models for Effective and Explainable Multi-Agent Credit Assignment
This paper introduces a novel method for improving multi-agent reinforcement learning by using large language models (LLMs) to better distribute rewards among agents (credit assignment). This is framed as an "agreement problem" where the LLM identifies when agents are effectively collaborating towards a common goal and rewards them accordingly. Two methods are proposed: LLM-MCA, which performs credit assignment, and LLM-TACA, which adds explicit task assignments from the LLM to further guide agent training. Both methods leverage the pattern recognition and reasoning abilities of LLMs to improve collaborative behaviors, especially in scenarios with sparse rewards, and outperform existing baselines on several benchmarks, including a new benchmark "Spaceworld" simulating multi-agent in-space assembly. As a byproduct, the research generates a novel dataset of agent trajectories with per-agent reward annotations to aid future offline multi-agent reinforcement learning research.