How can LLMs optimize multi-agent data science workflows?
SPIO: Ensemble and Selective Strategies via LLM-Based Multi-Agent Planning in Automated Data Science
April 1, 2025
https://arxiv.org/pdf/2503.23314This paper introduces SPIO, a framework using LLMs to improve automated data science pipelines. SPIO uses multiple agents specializing in data preprocessing, feature engineering, model selection, and hyperparameter tuning. These agents propose multiple strategies, which are then refined and either selected (SPIO-S) or ensembled (SPIO-E) by an LLM-based optimization agent. Key LLM aspects include: orchestrating the multi-agent system, generating and ranking candidate plans based on data and task descriptions, enabling dynamic workflow adaptation via iterative feedback, and improving predictive performance through both single best-path selection and ensemble strategies.