How can I explain AI car behavior causally?
Generating Causal Explanations of Vehicular Agent Behavioural Interactions with Learnt Reward Profiles
This paper proposes a method for generating causal explanations of interactions between autonomous vehicles (specifically, why one vehicle's action caused another to react a certain way). It learns a "reward profile" for each vehicle, representing its motivations at a specific moment (e.g., prioritizing speed, safety, or lane changes). This profile, combined with simulating counterfactual scenarios ("what if another vehicle hadn't done X?"), helps determine causal links between actions and generate human-readable explanations like "Red overtaking caused Green to slow down, as Green wishes to prioritize safety."
Key points for LLM-based multi-agent systems: The reward profile learning, although simplified here with linear regression, could be replaced by more sophisticated LLM-based motivation modeling. The counterfactual reasoning component aligns well with LLMs' ability to generate and analyze diverse hypothetical scenarios, offering a potential avenue for enhancing explainability in multi-agent systems. The focus on generating causal explanations also fits naturally with efforts to build more transparent and trustworthy AI systems.