Can agents learn better from their failures than successes?
Does storing reasoning strategies extracted from both successful and failed experiences improve agent learning compared to tracking only successes or raw trajectories? This matters because failures offer preventative lessons that successes alone cannot teach.
ReasoningBank (2509.25140) departs from prior agent-memory work along two axes at once. First, it stores strategy-level reasoning hints rather than reusable workflows, instance-level concepts, or raw trajectories. Second, it draws those strategies from both successful AND failed experiences — judged by the agent itself without ground-truth labels. The combination matters because each axis on its own underperforms the joint version.
The strategy-level abstraction is what differentiates it from agent-workflow-memory approaches, which store procedural sequences. A reusable workflow says "to find a place's zip code, first search by name, then extract location, then look up zip." A strategy says "when an entity attribute is requested, identify which lookup primitive returns it most directly; chain only when a single primitive cannot suffice." Strategies generalize across tasks; workflows generalize across instances of the same task.
The failure-inclusion is what differentiates it from systems that only store successful trajectories. Failed experiences contribute preventative lessons — strategies that look promising but fail under specific conditions. The agent abstracts both into actionable principles. This addresses a known gap: success-only memory teaches what worked but never what to avoid.
The deeper finding is memory-aware test-time scaling (MaTTS). Scaling test-time compute generates more rollouts per task; more rollouts generate diverse experiences; diverse experiences provide richer contrastive signals for distilling higher-quality memory; better memory guides subsequent scaling toward more promising rollouts. Memory and compute compound rather than substitute. This is a different scaling law from the parameter scaling law — accuracy improves with cumulative interaction history, not just with one-time training compute.
The implicit theory of mind: agents become more capable not by accumulating data but by accumulating judged distinctions. The self-judgment step is doing the work. ReasoningBank can label its own success/failure because the agent has access to the task-grounded signals (did the search return useful results? did the action achieve the subtask?) — labels are emergent from interaction rather than annotation. This makes the approach scalable in deployment, not just in training.
The result reframes the relationship between memory and inference compute. Prior work treated them as separate dimensions; ReasoningBank shows they are coupled, and their coupling is itself a scaling axis.
Paper: ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
Related concepts in this collection
-
Can agents learn reusable sub-task routines from past experience?
Do web agents fail at long-horizon tasks because they cannot extract and reuse workflows shared across similar problems? This explores whether sub-task abstraction enables skill accumulation rather than task-by-task problem solving.
AWM stores procedural workflows; ReasoningBank abstracts higher to strategies that span tasks
-
Can frozen language models continually improve through memory structure alone?
If agents can't update parameters, what form of textual memory lets them keep learning across trials and transfer to new tasks without retraining?
CLIN stores causal abstractions; ReasoningBank's strategy abstractions are a strategic cousin operating without environment-specific causal structure
-
Can agents learn from failure without updating their weights?
Explores whether language models can improve through trial and error by storing reflections in episodic memory rather than fine-tuning. This matters because it suggests a fundamentally different path to agent adaptation.
Reflexion uses raw episodic reflection; ReasoningBank distills across episodes into transferable strategies
-
Does agent memory degrade when continuously consolidated?
Can consolidating agent experiences into summaries actually harm long-term performance? Research on ARC-AGI tasks suggests continuous memory updates may reduce capability below the no-memory baseline.
direct tension: ReasoningBank claims consolidation works when done over strategies-with-conditions; faulty-memory paper shows consolidation regresses below baseline; resolution may be in *what* gets abstracted
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
distilling reasoning strategies from both successes and failures outperforms raw trajectories — and creates synergy with test-time scaling