Agentic Systems and Planning

Can agents learn better from their failures than successes?

Does storing reasoning strategies extracted from both successful and failed experiences improve agent learning compared to tracking only successes or raw trajectories? This matters because failures offer preventative lessons that successes alone cannot teach.

Note · 2026-05-18 · sourced from Memory
Why do multi-agent systems fail despite individual capability? How should we allocate compute budget at inference time? What actually constrains large language models from self-improvement?

ReasoningBank (2509.25140) departs from prior agent-memory work along two axes at once. First, it stores strategy-level reasoning hints rather than reusable workflows, instance-level concepts, or raw trajectories. Second, it draws those strategies from both successful AND failed experiences — judged by the agent itself without ground-truth labels. The combination matters because each axis on its own underperforms the joint version.

The strategy-level abstraction is what differentiates it from agent-workflow-memory approaches, which store procedural sequences. A reusable workflow says "to find a place's zip code, first search by name, then extract location, then look up zip." A strategy says "when an entity attribute is requested, identify which lookup primitive returns it most directly; chain only when a single primitive cannot suffice." Strategies generalize across tasks; workflows generalize across instances of the same task.

The failure-inclusion is what differentiates it from systems that only store successful trajectories. Failed experiences contribute preventative lessons — strategies that look promising but fail under specific conditions. The agent abstracts both into actionable principles. This addresses a known gap: success-only memory teaches what worked but never what to avoid.

The deeper finding is memory-aware test-time scaling (MaTTS). Scaling test-time compute generates more rollouts per task; more rollouts generate diverse experiences; diverse experiences provide richer contrastive signals for distilling higher-quality memory; better memory guides subsequent scaling toward more promising rollouts. Memory and compute compound rather than substitute. This is a different scaling law from the parameter scaling law — accuracy improves with cumulative interaction history, not just with one-time training compute.

The implicit theory of mind: agents become more capable not by accumulating data but by accumulating judged distinctions. The self-judgment step is doing the work. ReasoningBank can label its own success/failure because the agent has access to the task-grounded signals (did the search return useful results? did the action achieve the subtask?) — labels are emergent from interaction rather than annotation. This makes the approach scalable in deployment, not just in training.

The result reframes the relationship between memory and inference compute. Prior work treated them as separate dimensions; ReasoningBank shows they are coupled, and their coupling is itself a scaling axis.


Paper: ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

Related concepts in this collection

Concept map
17 direct connections · 128 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

distilling reasoning strategies from both successes and failures outperforms raw trajectories — and creates synergy with test-time scaling