Do naive bandit learners collude?
Naive Algorithmic Collusion: When Do Bandit Learners Cooperate and When Do They Compete?
November 26, 2024
https://arxiv.org/pdf/2411.16574This paper explores how simple AI agents (specifically, multi-armed bandit algorithms) learn to "collude" (cooperate to achieve higher rewards) in a competitive setting, even when they are not explicitly programmed to do so and have no knowledge of each other or the game structure (Prisoner's Dilemma).
Key findings relevant to LLM-based multi-agent systems:
- Naive collusion is possible: Even without explicit communication or knowledge of the overall game, agents using certain algorithms (e.g., deterministic bandits, UCB with certain parameters) can learn to cooperate. This has implications for LLM agents that might implicitly coordinate even without being specifically designed for it.
- Algorithm choice matters: The specific algorithm used drastically impacts the emergent behavior. Deterministic algorithms tend to lead to collusion, while algorithms with built-in randomness (e.g., epsilon-greedy) tend towards competition. This highlights the critical importance of algorithm selection in multi-agent LLM systems to achieve the desired level of cooperation or competition.
- Symmetry increases collusion potential: Using the same algorithm across agents increases the likelihood of collusion, especially in a "hub-and-spoke" scenario where a central distributor provides the algorithm. This is relevant to LLM deployments where multiple instances of similar models might interact.
- Collusion is not guaranteed: Even with similar algorithms, small variations (e.g., in tie-breaking rules) and different payoff structures can significantly impact outcomes, making it difficult to predict whether collusion will occur. This points towards the need for careful monitoring and analysis of emergent behavior in multi-agent LLM systems.
- Implications for regulation: Traditional antitrust regulations are based on explicit agreements. The possibility of naive collusion challenges this notion, raising questions about how to regulate unintended cooperative behavior in AI systems. This has relevance to the development and deployment of multi-agent LLM applications.