Can LLMs replace mixing networks in MARL?
QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning?
This paper introduces QLLM, a novel approach to credit assignment in multi-agent reinforcement learning (MARL) that uses large language models (LLMs) to replace traditional mixing networks. It argues that LLMs, through their knowledge and code generation abilities, can create more efficient and interpretable credit assignment functions. Key to QLLM is a coder-evaluator framework where one LLM generates candidate credit assignment functions (called TFCAF) and another LLM refines them through feedback, mitigating LLM hallucinations. This results in a training-free credit assignment function that improves performance, especially in complex scenarios with high-dimensional state spaces. The approach is compatible with existing value-decomposition MARL algorithms.