How can LLMs build collaborative data agents?
A Survey on Large Language Model-based Agents for Statistics and Data Science
This paper surveys Large Language Model (LLM)-powered data agents, exploring their evolution, capabilities, and applications in simplifying complex data analysis tasks. It categorizes existing data agents, focusing on their frameworks (planning, reasoning, reflection), user interfaces, knowledge integration, and system design, emphasizing multi-agent collaboration where multiple agents with specialized expertise work together. Key points for LLM-based multi-agent systems include: LLMs form the core reasoning and code generation engine; planning methods range from linear sequences to hierarchical graphs with increasing complexity and adaptability; reflection and self-correction are implemented using feedback loops and iterative code revision; multi-agent systems delegate sub-tasks based on agent specialization for increased efficiency; and knowledge integration leverages tool access, knowledge bases, and in-context learning. The paper also explores challenges and future directions for LLM-based data agents like multi-modality handling, integration with other large models, and development of a robust package ecosystem for broader adoption in statistical analysis.