Can LLMs build adaptable Hanabi-playing agents?
A GENERALIST HANABI AGENT
This paper introduces R3D2, a new AI agent for the cooperative card game Hanabi, designed to be more flexible and adaptable than previous approaches. It tackles the challenge of agents overfitting to specific teammates and game configurations (number of players) during training, hindering their ability to cooperate with unfamiliar agents or in different settings.
Key points for LLM-based multi-agent systems: R3D2 uses a text-based representation of the game state and actions, facilitating generalization across different game configurations and promoting knowledge transfer. The dynamic action space, also text-based, enables collaboration with agents trained on different game settings. This text-based approach allows for a simpler self-play training regimen while achieving robust zero-shot coordination. The paper also explores the use of various Language Models, highlighting the limitations of using LLMs directly for playing Hanabi and motivating the reinforcement learning approach of R3D2. Furthermore, it introduces the concept of "variable-player learning," a multi-agent variant of multi-task learning where the number of players can change during training, enabling generalization across diverse gameplay scenarios.