Large Language Model Agents Are Not Always Faithful Self-Evolvers

Paper · arXiv 2601.22436 · Published January 30, 2026
EvolutionFlaws

Self-evolving large language model (LLM) agents continually improve by accumulating and reusing past experience, yet it remains unclear whether they faithfully rely on that experience to guide their behavior. We present the first systematic investigation of experience faithfulness—the causal dependence of an agent’s decisions on the experience it is given—in self-evolving LLM agents. Using controlled causal interventions on both raw and condensed forms of experience, we comprehensively evaluate four representative frameworks across 10 LLM backbones and 9 environments. Our analysis uncovers a striking asymmetry: while agents consistently depend on raw experience, they often disregard or misinterpret condensed experience, even when it is the only experience provided. This gap persists across single- and multi-agent configurations and across backbone scales. We trace its underlying causes to three factors: the semantic limitations of condensed content, internal processing biases that suppress experience, and task regimes where pretrained priors already suffice. These findings challenge prevailing assumptions about self-evolving methods and underscore the need for more faithful and reliable approaches to experience integration.

The emergence of self-evolving agents represents a pivotal step in the development of autonomous systems capable of continuous learning and adaptation (Zhao et al., 2024b; Dou et al., 2025; Silver & Sutton, 2025). Unlike the traditional static paradigms, these agents dynamically gather, store and reuse experiences from their interactions with the environment to inform future decisions (Gao et al., 2025; Cai et al., 2025; Bell et al., 2025; Hendrycks et al., 2025).

At the center of this paradigm is the use of experience. Such experience generally falls into two categories: raw and condensed (Hu et al., 2025; Zhang et al., 2025b). As demonstrated in the left part of Figure 1, raw experiences capture concrete historical traces, such as successful trajectories from similar tasks, that agents can directly reference or replay (Zhao et al., 2024a; Zhang et al., 2025a). Condensed experiences, by contrast, are distilled from those traces and encode transferable insights, including abstract plans or failure heuristics (Ouyang et al., 2025; Wang et al., 2025). Despite their central role, prior work has focused mainly on how such experiences are stored or represented, leaving it unclear whether agents actually and faithfully leverage them to improve performance. To address this, we present the first systematic investigation into the faithfulness of experience utilization in self-evolving LLM agents, organized around two core research questions (RQs).

We begin by systematically examining (RQ1) is the performance improvement of self-evolving agents faithfully attributable to their use of past experiences? (§3 & §4). To answer this, we introduce a suite of controlled causal interventions targeting both raw and condensed experiences, and assess how such perturbations affect downstream behavior. To illustrate this, Figure 1 shows a motivating example where raw and condensed experiences are perturbed in different ways. We define experience faithfulness as the extent to which an agent’s behavior is causally grounded in its input experience—i.e., if perturbing the experience leads to significant behavioral changes, we consider the agent to have faithfully used it. Our evaluation spans four representative self-evolving frameworks, encompassing both offline (Zhao et al., 2024a) and online (Ouyang et al., 2025) paradigms, across single-agent and multi-agent settings (Zhang et al., 2025a). We benchmark 10 diverse LLM backbones across 9 environments, including reasoning, web interaction, and embodied decision-making, providing comprehensive coverage of both model families and application settings.

We first show that agents are consistently more faithful to raw experiences than to condensed ones when both are present, exhibiting substantial behavioral changes under raw experience perturbations but not under condensed ones (§4.1). We further demonstrate that this lack of faithfulness to condensed inputs persists even when raw experience is entirely absent, indicating that the problem is not due to competition or overshadowing (§4.2). Extending our analysis to collaborative multi-agent settings, this asymmetry remains: agents reliably exploit raw trajectories while largely ignoring the semantic content of condensed summaries (§4.3). Finally, this faithfulness disparity proves robust across model scales: while larger models achieve higher overall performance, they still fail to meaningfully ground their behavior in condensed experience (§4.4). These findings reveal a core limitation of current self-evolving agents: although they benefit from accumulated experience, they nonetheless display pronounced faithfulness failures—most notably in how they utilize condensed experience.

These findings naturally lead to our second question: (RQ2) why do self-evolving agents often fail to faithfully leverage condensed experiences? (§5) We trace this to a cascading triad of causes rooted in the three core components of selfevolving systems. First, condensed experiences themselves are often semantically limited—many encode only vague heuristics or generic summaries, lacking the specificity required to guide behavior (§5.1). Second, even when relevant content is present, agents often fail to utilize it due to internal processing biases (Mohsin et al., 2025) that favors local contextual signals over retrieved information (§5.2). Finally, the structure of the task further compounds this issue: for certain types such as knowledge-intensive benchmarks, agents often succeed by relying solely on their pretrained semantic priors (Shi et al., 2024), reducing the marginal utility of retrieved experience and diminishing the model’s incentive to incorporate external guidance at all (§5.3). In summary, our findings challenge the common assumption that self-evolving agents faithfully leverage their accumulated experiences. Despite performance gains, agents often ignore or misuse condensed experience, revealing a significant gap between utility and faithfulness. Our study provides a principled framework to diagnose this issue and underscores the need for more reliable and interpretable mechanisms for experience-driven adaptation in LLM agents.