Can frozen language models learn without updating their parameters?
If agents built on frozen models can't change their weights, what kind of memory structure would let them keep improving across trials and transfer to new tasks? This challenges assumptions about how continual learning must work.
CLIN argues that the bottleneck for continual learning in language agents is not parameter updates but the structure of what gets remembered. Reflexion-style agents (see Can agents learn from failure without updating their weights?) maintain "helpful hints" — generic verbal reflections that work for the immediate trial but transfer poorly across tasks and environments. CLIN's wager is that a specific style of memory — causal abstractions of the form "opening doors may be necessary for movement between rooms" — produces durable, transferable knowledge because causal structure is what predicts which action to take next.
Empirically the wager pays off. On ScienceWorld, CLIN beats SOTA reflective agents like Reflexion by 23 absolute points on repeated trials. More importantly it transfers: zero-shot performance on new environments improves by 4 points (13 for new tasks), and continued memory updates in the new setting add another 17 points (7 for new tasks). The causal-abstraction memory is therefore not just a within-task accelerator but a substrate for cross-environment generalization.
The conceptual move is to position language-model agents as a modern instantiation of action model learning — but with the action model written in natural language and continually edited rather than learned as parameters. Useful causal knowledge persists across trials, unhelpful causal knowledge is dropped. This suggests a new architectural pattern: agents built on frozen models can still continually and rapidly improve over time if the memory representation is the right shape. The shape that matters is causal, not encyclopedic — a position that pairs interestingly with Can agents learn reusable sub-task routines from past experience? (workflow-shaped memory) and Does state-indexed memory outperform high-level workflow memory for web agents? (state-action-shaped memory). The three notes target the same problem (what shape should agent memory take?) and disagree on the answer.
Source: Action Models
Related concepts in this collection
-
Can agents learn reusable sub-task routines from past experience?
Does extracting and abstracting sub-task workflows from previous trajectories enable web agents to build complex skills compositionally? This matters because it could explain why agents fail at long-horizon tasks despite strong reasoning abilities.
tension with: CLIN says causal-rule memory transfers; AWM says abstracted workflow-routine memory transfers; both make transferability the criterion but pick different memory shapes.
-
Does state-indexed memory outperform high-level workflow memory for web agents?
Should procedural memory for web agents be organized around specific environment states and actions, or abstracted into higher-level workflows? This matters because web automation demands precise, context-sensitive recall that workflows might lose.
tension with: PRAXIS says local state-action memory beats both abstracted workflows and causal rules for web environments; the three notes form a memory-granularity tension.
-
Can agents learn from failure without updating their weights?
Explores whether language models can improve through trial-and-error by storing reflections in memory rather than through gradient-based parameter updates. Tests if environmental feedback alone can drive learning.
extends: Reflexion is the baseline CLIN improves on by 23 points; the contrast is generic-hint memory vs causal-rule memory.
-
Why do LLM agents ignore condensed experience summaries?
LLM agents faithfully learn from raw experience but systematically disregard condensed summaries of the same experience. This study investigates whether the problem lies in how summaries are made, how models process them, or whether models simply don't need them.
complicates: agents systematically ignore condensed experience even when it's the only experience provided. CLIN's causal abstractions would qualify as condensed — an open question whether CLIN's gains survive the faithfulness asymmetry.
-
Can agents learn continuously through memory without updating weights?
Explores whether LLM agents can adapt to new tasks and failures by retrieving and updating past experiences stored in memory, rather than requiring expensive parameter fine-tuning.
complements: both demonstrate that memory-shape choices enable continual adaptation without parameter updates; CLIN uses causal rules, the case-based variant uses retrieval over episodic cases.
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
causal abstractions in dynamic textual memory let frozen-model agents continually improve — outperforming Reflexion by 23 points without parameter updates