How can LLMs improve USV swarm MARL policy?
Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm
This research introduces a new method for incorporating human feedback into the training of multi-agent AI systems, particularly for scenarios like controlling a swarm of unmanned surface vehicles (USVs). It addresses the challenge of aligning AI behavior with nuanced human preferences that are difficult to encode in traditional reward functions. The key innovation is Agent-Level Feedback, which allows humans to provide feedback on individual agents' performance, rather than just the overall team. This granular feedback is then used to train a reward model, enabling the AI system to learn and adapt to human preferences more effectively. The system uses an LLM as an evaluator, simulating diverse feedback scenarios like avoiding collisions and allocating tasks. This approach allows for more complex and nuanced control of multi-agent systems, closer to human decision-making, and demonstrates the practical value of combining LLMs with multi-agent reinforcement learning.