How does opponent learning impact large-scale agent evolution?
Evolution with Opponent-Learning Awareness
October 24, 2024
https://arxiv.org/pdf/2410.17466This paper studies how large populations of AI agents, each using a learning algorithm (Policy Gradient or LOLA), evolve strategies within game theory scenarios (Stag Hunt, Hawk-Dove, Rock-Paper-Scissors).
- It provides fast, parallelizable implementations of Policy Gradient and LOLA for these games, making large-scale simulations with 200,000 agents possible.
- It observes that populations of agents using the more advanced LOLA algorithm can converge to different strategies compared to simpler learning agents, highlighting LOLA's potential impact on multi-agent system dynamics.
- While not directly using LLMs, the findings are relevant to LLM-based multi-agent systems as they showcase how different learning algorithms can lead to distinct emergent behaviors in a population of agents. This emphasizes the importance of carefully choosing and understanding the implications of learning algorithms in multi-agent settings.