How can I safely optimize multi-agent control with unknown dynamics?
DISCRETE GCBF PROXIMAL POLICY OPTIMIZATION FOR MULTI-AGENT SAFE OPTIMAL CONTROL
This paper introduces DGPPO, a new method for training safe multi-agent AI systems that learn to perform tasks while avoiding unsafe situations. It addresses challenges in existing methods that rely on known dynamics or pre-existing expert policies. DGPPO learns a safety function (discrete graph Control Barrier Function) alongside the agents' policies, allowing them to adapt to changing environments and limited sensing capabilities. This is particularly relevant for LLM-based multi-agent systems where complex dynamics and partial observability are common. The learned safety function enables more robust and adaptable safety guarantees compared to traditional hard-coded rules, and the joint training of policy and safety function enables better overall performance compared to methods that separate these two learning processes.