How can agents efficiently share hints in multi-armed bandits?
Heterogeneous Multi-Agent Bandits with Parsimonious Hints
This paper explores how "hints" (like predictions from an LLM) can improve the efficiency of multi-agent systems in making decisions, particularly when those agents must coordinate to avoid conflicts (like multiple agents trying to access the same resource). It introduces algorithms for both centralized (single decision-maker) and decentralized (independent agents) systems, aiming to minimize both the number of hints needed and the "regret" (the difference between the ideal outcome and the achieved outcome). For LLM-based multi-agent systems, the key takeaway is the potential for LLMs to provide valuable predictive hints, allowing agents to learn optimal strategies faster and with fewer costly direct interactions. The focus on minimizing hint usage is particularly relevant due to the potential cost of LLM queries. The decentralized algorithms are especially interesting, exploring how agents can share learned information effectively, even without a central coordinator, using methods analogous to collision-based communication.