Large Language Model-based Data Science Agent: A Survey

Paper · arXiv 2508.02744 · Published August 2, 2025

The rapid advancement of Large Language Models (LLMs) has driven novel applications across diverse domains, with LLM-based agents emerging as a crucial area of exploration. This survey presents a comprehensive analysis of LLM-based agents designed for data science tasks, summarizing insights from recent studies. From the agent perspective, we discuss the key design principles, covering agent roles, execution, knowledge, and reflection methods. From the data science perspective, we identify key processes for LLM-based agents, including data preprocessing, model development, evaluation, visualization, etc. Our work offers two key contributions: (1) a comprehensive review of recent developments in applying LLMbased agents to data science tasks; (2) a dual-perspective framework that connects general agent design principles with the practical workflows in data science.

In this survey, we examine LLM-based data science agents from two complementary perspectives: agent design and data science application. From the agent design perspective, we summarize key architectural paradigms—including single-agent systems, collaborative multi-agent structures, and dynamic agent generation— and analyze core components such as agent roles, execution strategies, knowledge integration, and reflection mechanisms.

3 Analysis from Agent Perspective

Large Language Model (LLM)-based agents have emerged as powerful tools in various domains, particularly in data science. The design and functionality of LLM-based agents can be understood through their basic components, which include agent role, execution structure, knowledge, and reflection.

Agent role (see §3.1). LLM agents are allocated different roles, which allows the agents to split the main tasks and focus on specific tasks. Diverse agent roles are presented in previous works, ranging from singleagent systems handling all tasks independently to multi-agent systems with specialized roles like developers, testers, planners, etc.

Execution structure (see §3.2). The execution structure designs how agents manage task allocation, task execution, user interaction, error handling, etc. The execution structure covers dynamic planning where agents adjust plans based on real-time feedback, fixed workflows with predefined task sequences, and planthen- execute frameworks that separate strategy formulation from task execution.

External Knowledge (see §3.3). The knowledge sources allow agents to access and integrate external information, enhancing their ability in specific domains. LLM-based agents augment their knowledge through external databases, retrieval-based approaches, and API calls.

Reflection (see §3.4). Reflection methods provide feedback information for LLM-based agents to improve performance and adapt to complex environments. Techniques include agent feedback for self-correction, model metrics feedback for optimization, code error handling for reliability, and history window mechanisms for long-term learning.

Agent structure. The structure of LLM-based agents can be categorized into several types, each with its advantages and challenges:

With a manager: In this structure, a central agent manages and controls all agents. Software engineeringstyle agents and Client-server agents mainly have this structure. For example, in the AutoML-GPTTrirat et al. (2024) framework, a central LLM serves as the controller, managing the entire pipeline by integrating specialized agents for subtasks such as model design and hyperparameter tuning.
Without a manager: Each agent operates independently and solves tasks autonomously. Minimum function agents mainly pose this structure, since all the agents with minimum function share the same position. An example of this is the MASAI Arora et al. (2024) framework, which utilizes decentralized agents that collaborate on machine learning and data science tasks by sharing results but not a central management system.
Hierarchical managers: A higher-level agent controls lower-level agents in a layered structure. Hierarchically generated agents and part of software engineering style agents mainly pose such structure. An example of this can be found in Hierarchical agent generation, such as in EvoMAC Hu et al. (2024b), where a parent agent dynamically creates child agents to handle specific subtasks during runtime. Agent relationship. The relationship between agents within the system can vary significantly depending on the design philosophy:
Compete: Agents work against each other to complete a task, often in the coder-reviewer paradigm (similar to the adversarial concept in GAN). In frameworks like MASAI Arora et al. (2024), agents engage in competitive strategies to address machine learning and data science challenges. The competition between reviewers and coders allows the iterative refinement of the code. Some works also allow multiple agents to propose multiple plans to compete.
Collaborate: Agents work together toward a shared objective. For example, in the MAGIS Tao et al. (2024) framework, agents assume different roles like Manager, Developer, and QA Engineer to collaborate on resolving GitHub issues. Their tasks are divided to ensure modular development, with continuous collaboration between agents.
Hybrid: Agents alternate between competing and collaborating based on the task requirements. For instance, in AutoCodeRover Zhang et al. (2024e), agents work together to localize faults and generate patches, but may compete in terms of optimizing solutions or strategies based on the specific issue at hand.

Agent role task allocation. LLM-based agents can allocate tasks in either a static or dynamic manner:

Static Task Allocation: In some systems, agents are assigned a fixed set of tasks that they perform. For example, in the Data Director Hong et al. (2024) framework, agents follow a static task allocation, where the tasks are predefined and agents work through structured stepwise execution.
Dynamic Task Allocation: Tasks are allocated based on the real-time needs and feedback from the system or environment. An example of dynamic task allocation can be seen in the EvoMAC Hu et al. (2024b) framework, where agents adjust dynamically based on environmental feedback, creating or dismissing agents as needed to refine or expand their tasks.

Agent Role Task Granularity. The granularity of tasks assigned to agents influences both the precision and complexity of their execution:

Coarse Granularity: Some agents are given broader, less detailed tasks. For example, in AutoML-GPT Trirat et al. (2024), the central controller agent coordinates the entire machine learning pipeline, handling coarse-grained tasks such as managing the overall workflow rather than focusing on the details of each individual task.
Fine Granularity: In other cases, tasks are broken down into smaller units for more specific execution. For example, in MapCoder Islam et al. (2024a), multiple agents collaborate, each handling a fine-grained task such as code generation, debugging, or retrieval. This detailed task assignment ensures high accuracy but requires more computation overhead.