How can LLMs learn optimal multi-agent task decomposition?
Learning Symbolic Task Decompositions for Multi-Agent Teams
This paper introduces LOTaD (Learning Optimal Task Decompositions), a method for automatically finding the best way to divide a complex task among multiple AI agents in a cooperative setting. It uses "reward machines" to represent the overall task and its possible sub-tasks. LOTaD learns which decomposition is most efficient by trying different combinations and observing agent performance during training. This allows it to learn both the optimal task breakdown and the best policies for each agent to perform their assigned sub-tasks.
Key points for LLM-based multi-agent systems: LOTaD's task-conditioned policy architecture allows a single LLM to learn and generalize across multiple sub-tasks within different decompositions, promoting efficient learning. Its ability to handle dependent agent dynamics (where one agent's actions affect another's) is particularly relevant to complex multi-agent scenarios where LLMs must coordinate. The automatic decomposition learning removes the need for manual task division, crucial for scaling complex LLM-based multi-agent applications. Using reward machines as a symbolic representation of tasks could be adapted to leverage the symbolic reasoning capabilities of LLMs.