How well do LLMs generate complex workflows?
BENCHMARKING AGENTIC WORKFLOW GENERATION
October 11, 2024
https://arxiv.org/pdf/2410.07869This paper introduces WORFBENCH, a benchmark for evaluating how well Large Language Models (LLMs) can generate workflows, breaking down complex tasks into smaller, executable steps. It highlights that while LLMs are good at generating linear sequences of steps, they struggle with more realistic scenarios involving parallel steps and dependencies, which are represented as graphs.
Key takeaways for LLM-based multi-agent systems:
- Generating graph-based workflows for complex tasks is more challenging than simple linear sequences.
- Current LLMs are not yet adept at generating practical, graph-structured workflows, even with training.
- Workflows can significantly improve LLM agent performance by acting as structured prior knowledge or enhancing Chain-of-Thought prompting.
- Workflows can reduce inference time by allowing parallel task execution and shortening the planning process.
- For LLM agents to excel at workflow generation, integrating world knowledge and models is crucial.