How can I optimize agent guidance for dynamic MAPF?
Online Guidance Graph Optimization for Lifelong Multi-Agent Path Finding
This paper explores optimizing traffic flow in Lifelong Multi-Agent Path Finding (LMAPF), a problem where multiple agents navigate a map with dynamically assigned goals. The core idea is using a learned online guidance policy to adjust the map's edge weights in real-time based on traffic patterns, thus improving throughput (goals reached per timestep). This contrasts with static offline approaches. The online policy, optimized via CMA-ES, is integrated with two LMAPF algorithms: PIBT (direct planning) and GPIBT (guide-path planning). Experiments demonstrate that the online policy surpasses offline guidance and human-designed policies in throughput, especially in scenarios with shifting task distributions. Moreover, the online policy can mitigate deadlocks in PIBT.
For LLM-based multi-agent systems, this research highlights the potential of learning adaptive guidance policies to coordinate agent behavior dynamically. Using learned policies offers greater flexibility than predefined rules and adapts to changing circumstances. This dynamic coordination is especially relevant for complex, evolving scenarios requiring real-time responsiveness, as demonstrated by the dynamic task distribution and deadlock mitigation results.