How to balance AI agent security and collaboration?
Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems
February 27, 2025
https://arxiv.org/pdf/2502.19145This paper explores security vulnerabilities in multi-agent Large Language Model (LLM) systems, similar to how computer viruses spread. It focuses on scenarios where one compromised agent can infect others with malicious instructions, potentially disrupting the entire system. Key points regarding LLM-based multi-agent systems include:
- Infectious Prompts: Malicious instructions can spread like a virus through a multi-agent system if one agent is compromised.
- Defense Strategies: "Vaccines" (inserting a fake memory of successfully handling a malicious input) and safety instructions can mitigate the spread, but may also hinder cooperation between agents.
- Security-Cooperation Trade-off: Improving security by making agents more cautious can make them less cooperative on normal tasks.
- Model-Specific Vulnerabilities: Different LLM models have varying levels of vulnerability to these attacks, necessitating tailored security measures.
- Multi-Hop Analysis: Evaluating security requires looking at how malicious instructions spread through multiple interactions and affect agent behavior over time.