Can multi-agent NLP stop prompt injection?
PROMPT INJECTION DETECTION AND MITIGATION VIA AI MULTI-AGENT NLP FRAMEWORKS
March 17, 2025
https://arxiv.org/pdf/2503.11517This paper proposes a multi-agent framework to detect and mitigate prompt injection attacks against LLMs. It uses a pipeline of specialized agents (Front-End Generator, Guard/Sanitizer, Policy Enforcer, and KPI Evaluator) built on open-weight Meta Llama models and communicating via structured JSON messages based on the OVON standard. Key points for LLM-based multi-agent systems include: specialized agents for distinct tasks, layered defense for robustness, OVON for standardized communication and metadata exchange, injection-specific KPIs (ISR, POF, PSR, CCS, and TIVS) for evaluation, and the potential for dynamic agent integration and automated agent design in future systems.