How can DPO improve multi-guided diffusion for realistic traffic scenarios?
Direct Preference Optimization-Enhanced Multi-Guided Diffusion Model for Traffic Scenario Generation
February 19, 2025
https://arxiv.org/pdf/2502.12178This paper introduces MuDi-Pro, a method for generating realistic and controllable traffic scenarios using a diffusion model. It addresses the challenge of balancing realism with the ability to guide the behavior of multiple agents (vehicles) in a simulation. MuDi-Pro uses a novel training strategy involving multi-task learning and direct preference optimization (DPO) to fine-tune a diffusion transformer model. This enables the model to learn a prior based on real-world driving data, adapt to various guide inputs (e.g., desired trajectories, traffic rules), and generate diverse, realistic scenarios that adhere to user preferences.
Key points for LLM-based multi-agent systems:
- Guided Sampling: MuDi-Pro integrates multiple guidance signals within a unified model using a conditional layer, similar to multi-task learning. This is highly relevant to LLM agents where complex instructions/goals can be decomposed into sub-tasks.
- Direct Preference Optimization (DPO): Instead of reinforcement learning with human feedback, MuDi-Pro uses DPO to fine-tune based on preferences derived from guidance scores. This aligns with emerging trends in LLM training where DPO is used for aligning model outputs with user intentions. DPO simplifies feedback collection as it avoids explicit human evaluation.
- Classifier-Free Sampling (CFS): MuDi-Pro uses CFS to combine future-conditioned and non-future-conditioned predictions, offering control over how much future information influences the generated behaviors. This can be related to prompt engineering in LLMs, where controlling the level of detail in prompts influences agent behavior.
- Scene-Level Diffusion: Learning at the scene level helps the model capture interactions between multiple agents, which is crucial for multi-agent applications of LLMs. The diffusion model's ability to generate diverse samples is also valuable for exploring different multi-agent interaction patterns.