How to speed up LLM agent simulations?
AI METROPOLIS: SCALING LARGE LANGUAGE MODEL-BASED MULTI-AGENT SIMULATION WITH OUT-OF-ORDER EXECUTION
This paper introduces AI Metropolis, a simulation engine designed to make large language model (LLM)-based multi-agent simulations faster and more efficient. Current simulations are slow due to unnecessary synchronization between agents, limiting how many LLM requests can be processed concurrently. AI Metropolis solves this by using "out-of-order execution," allowing independent agents to progress at different speeds based on their individual workloads and dependencies. This is achieved by tracking the true dependencies between agents using a spatiotemporal graph, grouping related agents into clusters, and prioritizing tasks based on their simulation time step. This approach improves performance significantly, approaching the theoretical optimal throughput. Key to LLM-based systems is the focus on optimizing LLM inference throughput, the elimination of false dependencies through runtime dependency tracking, and the prioritizing of requests within the simulation's critical path.