How can parallel decoding improve multi-agent warehouse routing?
Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel Decoding
February 17, 2025
https://arxiv.org/pdf/2502.10233This paper introduces MAHAM (Multi-Agent Hierarchical Attention Model), a new algorithm for optimizing picker routes in mixed-shelves warehouses. The goal is to minimize the longest route taken by any picker (min-max objective) which improves overall order fulfillment time.
Key points for LLM-based multi-agent systems:
- Hierarchical and parallel decoding: MAHAM decides on locations and items to pick simultaneously for all agents, enabling coordination. This is unlike typical autoregressive models that plan sequentially for each agent.
- Sequential action selection within parallel decoding: While decisions are made in parallel, actions are executed sequentially to avoid conflicts (e.g., two pickers trying to grab the same item). This is managed via a learned ranking of which agents should act first.
- Agent context encoder: Encodes picker information like location, remaining capacity, and current tour length, enhancing decision-making. Includes a ranking-based positional encoding and self-attention for inter-agent communication.
- Parameter sharing: Increases efficiency and improves generalization by reusing parameters in the encoder's cross-attention mechanism.
- Self-supervised learning: The model trains on pseudo-optimal solutions generated by its own best performance, progressively improving over time.