How can I measure AI multi-agent system risk?
FREE ENERGY RISK METRICS FOR SYSTEMICALLY SAFE AI: GATEKEEPING MULTI-AGENT STUDY
This paper proposes using the Free Energy Principle (FEP) from Active Inference to measure and mitigate risk in multi-agent AI systems. It introduces a Cumulative Risk Exposure (CRE) metric, calculable via Monte Carlo simulations, that allows stakeholders to define preferences and tolerances for undesirable outcomes. The system uses a "gatekeeper" which monitors agents' planned actions, calculating CRE based on simulated futures. If the CRE exceeds a threshold, the gatekeeper intervenes, modifying the agent's policy to a safer alternative. This is demonstrated in a simulated autonomous vehicle environment where gatekeepers improve overall system safety by preventing risky driving behaviors.
Key points for LLM-based multi-agent systems:
- FEP and CRE offer a first-principles approach to AI safety by embedding stakeholder preferences into a probabilistic framework and providing a quantifiable risk metric.
- Gatekeepers can function as a safety layer, evaluating and overriding LLM-generated actions if they are deemed too risky based on simulated outcomes.
- Preference priors and temperature parameters provide a flexible and interpretable way to specify safety constraints and risk tolerance for LLMs.
- Collective intelligence through information sharing between gatekeepers further enhances decision-making by accounting for interactions between agents. This could be implemented by sharing LLM context or internal representations among agents.
- This approach shifts focus from complex world modeling (required in some other safety frameworks) to preference specification, which is potentially simpler and more adaptable for different applications.