How to efficiently allocate scarce resources in a multi-agent system?
Finite-Horizon Single-Pull Restless Bandits: An Efficient Index Policy For Scarce Resource Allocation
This paper introduces Single-Pull Restless Multi-Armed Bandits (SPRMABs), a variation of the classic multi-armed bandit problem where resources (like interventions or selections) can be applied to each option (arm) only once. It proposes a new index-based policy, called Single-Pull Index (SPI), designed to optimize resource allocation in these scenarios, especially when resources are scarce.
For LLM-based multi-agent systems, SPRMABs offer a framework for deciding which agents should receive limited "prompts" or other interactions where repeated interaction with the same agent isn't ideal or possible. SPI could help prioritize these interactions based on expected reward, offering a potential mechanism for efficient resource use and improved overall system performance in scenarios where fairness or single-interaction constraints apply.