Can LLMs improve distributed AI training?
First Field-Trial Demonstration of L4 Autonomous Optical Network for Distributed AI Training Communication: An LLM-Powered Multi-AI-Agent Solution
This paper demonstrates a multi-AI-agent system called AutoLight for automating control of a complex optical network used in distributed AI training. It shows a significant improvement over single-agent approaches and a naive multi-agent system.
Key points for LLM-based multi-agent systems: AutoLight utilizes a hierarchical structure of "Planner" and "Task" agents powered by LLMs and uses a novel "Chain of Identity" (CoI) method for inter-agent communication. CoI ensures consistent identity and context, using formatted handoffs, pseudo-SystemMessage injection, and pre-execution declarations. This structured communication overcomes limitations of single-agent and naive multi-agent approaches in complex, multi-domain scenarios. The demonstration shows that specialized tools within each agent and the structured communication of CoI greatly improve task completion rates in a real-world-emulated distributed AI training network.