How can LLMs represent traffic scenes for multi-vehicle collaboration?
GITSR: Graph Interaction Transformer-based Scene Representation for Multi Vehicle Collaborative Decision-making
This research proposes GITSR, a new framework for multi-vehicle collaborative decision-making in mixed traffic scenarios (CAVs and HDVs). GITSR uses a Graph Interaction Transformer to represent the scene, combining agent-centric local scene reconstruction with a GNN to capture spatial interaction between vehicles. This information feeds into a reinforcement learning module (MADQN) for decision-making. Key for LLM-based multi-agent systems is the combination of transformer architecture for scene understanding, GNNs for modeling agent interactions, and reinforcement learning for decision-making, demonstrating a practical multi-agent architecture applicable to complex scenarios. The agent-centric approach to scene representation, where the scene is reconstructed relative to each vehicle, is highlighted as beneficial for understanding complex interactions, although it poses a computational burden for larger-scale simulations. The combination of local and global information processing via Transformers and GNNs offers a potential blueprint for similar LLM-based multi-agent systems in different application domains.