How can I improve RL multi-vehicle training speed?
A Differentiated Reward Method for Reinforcement Learning based Multi-Vehicle Cooperative Decision-Making Algorithms
February 4, 2025
https://arxiv.org/pdf/2502.00352This paper proposes a new reward function for training multi-agent reinforcement learning (MARL) models for cooperative autonomous driving in continuous traffic flow. The "differentiated reward" method calculates rewards based on changes in state (e.g., vehicle position, speed) over time, rather than absolute state values. This approach addresses the issue of sparse rewards in stable traffic scenarios, leading to faster and more stable learning. The key takeaway for LLM-based multi-agent systems is the potential of this differentiated reward design to improve training effectiveness in similar environments where state changes are more informative than absolute state values.