How can we prevent AI agents from causing harm?
The Problem of Social Cost in Multi-Agent General Reinforcement Learning: Survey and Synthesis
December 4, 2024
https://arxiv.org/pdf/2412.02091This paper explores the "tragedy of the commons" problem in multi-agent reinforcement learning, where agents pursuing individual goals can cause harm to the overall system. It proposes using market-based mechanisms, specifically VCG (Vickrey-Clarke-Groves) mechanisms, to assign a "social cost" to agents' actions, thereby incentivizing cooperation and mitigating negative externalities.
For LLM-based multi-agent systems, the key takeaways are:
- Mechanism Design for Coordination: VCG mechanisms, or variations like the Exponential VCG, can coordinate LLMs by incorporating the impact of their actions on other agents, promoting beneficial collective behavior.
- Valuation Functions for LLMs: The paper defines valuation functions that quantify an LLM's preference for different outcomes, allowing them to participate in market-based coordination. These can be learned, even with partial observability.
- Addressing Misaligned Goals: The proposed framework offers a way to control LLMs with potentially misaligned goals by imposing costs on actions that negatively affect other agents or the overall system. This could be relevant for preventing unintended consequences in complex multi-agent LLM applications.
- Practical Considerations: The paper acknowledges challenges in real-world scenarios, like partial observability of other agents' actions and rewards, and suggests using Bayesian reinforcement learning and function approximation to address them. It also discusses static vs. dynamic mechanisms and agent-level vs. mechanism-level learning.
- Applications for Web Development: Examples like cap-and-trade systems and automated penetration testing illustrate how these concepts could be applied in web development, potentially leading to more robust and cooperative multi-agent LLM applications.