How can agents best manage video editing tools?
SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
This paper introduces SPAgent, a system for automating video generation and editing tasks by coordinating various open-source AI models like specialized tools. SPAgent uses an MLLM agent to understand user requests, plan execution steps based on predefined principles, select the best models for each step, and even evaluate and integrate new models autonomously. Key to its design is the decoupling of intent recognition from the execution planning and model selection, allowing it to handle diverse inputs (text, images, videos) and complex, multi-step editing requests. It also features automatic quality assessment, allowing SPAgent to incorporate and leverage new video generation and editing models without manual intervention.