How can I build trustworthy LLM agents?
A Survey on Trustworthy LLM Agents: Threats and Countermeasures
This paper surveys trustworthiness issues in Large Language Model (LLM)-based agents and Multi-Agent Systems (MAS). It introduces the TrustAgent framework, categorizing trustworthiness issues by agent module (brain, memory, tool) and interaction type (agent-to-agent, agent-to-environment, agent-to-user), covering attacks, defenses, and evaluations. Key points for LLM-based multi-agent systems include: (1) Novel attack vectors like infectious attacks spreading through MAS, prompt injection in multi-turn dialogues exploiting memory, backdoors triggered by multi-agent interactions, and tool manipulation for external attacks. (2) Defense strategies leveraging multi-agent collaboration for alignment, filtering, and topological defenses against attack propagation. (3) The need for dynamic evaluations reflecting complex agent-environment interactions and the importance of considering multi-agent trust dynamics beyond individual agent behavior.