How can I infer agent goals from observations using deep RL?
Goal Recognition using Actor-Critic Optimization
This paper introduces DRACO (Deep Recognition using Actor-Critic Optimization), a novel method for Goal Recognition (GR) that uses deep reinforcement learning to infer an agent's goal from its observed actions. Unlike traditional symbolic GR, DRACO handles continuous state/action spaces and noisy data by learning goal-conditioned policies directly from environment interaction data. It replaces costly real-time planning with learned neural networks representing potential agent behaviors for different goals. For comparison, the likelihood of observed actions given a goal is estimated using two new metrics derived from Wasserstein distance and Z-score.
Key points relevant to LLM-based multi-agent systems: DRACO demonstrates a move away from symbolic reasoning in multi-agent systems towards data-driven approaches. This aligns with the strengths of LLMs, which excel at learning from data. The concept of learning goal-conditioned policies, analogous to agents learning specific behaviors for different objectives, is highly relevant. The use of distance metrics to compare observed behavior to learned policies offers a potential mechanism for evaluating LLM-generated agent actions against expected behavior. The focus on robustness to noise and incomplete information is also crucial for real-world LLM agent deployments.