Can Q-learning agents avoid collusion in congestion games?
Cycles and collusion in congestion games under Q-learning
February 27, 2025
https://arxiv.org/pdf/2502.18984This paper investigates how Q-learning agents behave in a multi-agent Braess Paradox game, a simplified traffic routing scenario where individual incentives conflict with optimal system performance. It finds that Q-learning can lead to emergent cyclical behavior similar to "Edgeworth cycles," where agents coordinate to achieve better-than-Nash-equilibrium outcomes. However, these outcomes are not stable from an incentive perspective. When agents can choose their learning parameters strategically, they are incentivized to deviate from the cooperative settings, leading to suboptimal system performance.
Key points for LLM-based multi-agent systems:
- Continual learning and exploration: The study uses a continual learning setup with a constant exploration rate, which may be more relevant to LLM agents that continuously adapt to new information and dynamic environments.
- Parameter tuning as a meta-game: The paper frames parameter selection as a strategic game, highlighting the incentive incompatibility of cooperative parameter settings. This emphasizes the challenges of designing robust multi-agent systems where individual agents might be motivated to deviate from desired behavior.
- Information sharing and delays: The study explores the impact of information sharing (through the β parameter) and demonstrates that full information sharing can hinder the emergence of cooperative cycles. This is relevant to LLM-based systems where controlling the flow and delay of information between agents is crucial.
- Relevance to other domains: The paper draws parallels to Bertrand competition, suggesting that similar incentive issues and the potential for unintended collusion might arise in other multi-agent settings where LLMs could be deployed, particularly in competitive scenarios like pricing or resource allocation.