How can LLMs improve MARL credit assignment and observation?
LERO: LLM-driven Evolutionary framework with Hybrid Rewards and Enhanced Observation for Multi-Agent Reinforcement Learning
This paper introduces LERO, a framework that uses LLMs and evolutionary algorithms to improve multi-agent reinforcement learning (MARL). It tackles the challenges of credit assignment (figuring out which agent deserves credit for success) and partial observability (agents only seeing part of the environment) in MARL.
LERO uses LLMs to generate two key components: hybrid reward functions (HRFs) that combine team and individual rewards, and observation enhancement functions (OEFs) to add context to what agents see. An evolutionary algorithm then refines these LLM-generated components over multiple training cycles, improving agent performance and cooperation. The key innovation is combining LLM generation with evolutionary refinement to create better reward and observation functions for multi-agent systems. This is demonstrated with improved performance on cooperative navigation tasks.