What domain properties determine whether causal rules transfer to new agents?
This explores what has to be true about a task or environment for learned causal patterns to carry over to a different agent — rather than staying locked to the conditions where they were learned.
This explores what has to be true about a domain for causal rules to survive the jump to a new agent, and the corpus suggests the answer is less about the rules themselves and more about how they were acquired and where the grounding lives. The sharpest constraint comes from how an agent learned in the first place: agents trained on static expert demonstrations are capped by the curator's imagination, never having interacted with an environment, so their 'rules' are really frozen traces that don't transfer to scenarios the demonstrations never covered Can agents learn beyond what their training data shows?. By contrast, knowledge embedded through reinforcement rather than supervised imitation transfers better because the model internalizes coherent reasoning structure instead of token-level correctness Can reinforcement learning embed domain knowledge more effectively than supervised fine-tuning?, and RL can even surface complex domain reasoning from nothing but simple accuracy rewards Can simple rewards alone teach complex domain reasoning?. The property that travels, in other words, is reasoning that was earned by interaction, not copied.
A second determinant is whether the causal structure is genuinely load-bearing or just decorative. Fine-tuning can quietly sever the causal link between an agent's reasoning steps and its answers — the chain still gets written but no longer drives the output, so what looks like a transferable rule is performative rather than functional Does fine-tuning disconnect reasoning steps from final answers?. This matters for transfer because a rule that isn't actually doing causal work in the source agent has nothing to carry. Establishing whether a rule is real requires pairing representational evidence (the feature is there) with causal verification (the feature does something) — neither alone is enough to claim a mechanism Can we understand LLM mechanisms with only representational analysis?.
The third property is grounding: where the relevant information actually sits. Personas extracted from domain documents transfer across evaluation tasks precisely because they're anchored in real stakeholder perspectives rather than arbitrary roles Can personas extracted from documents generalize across evaluation tasks?. The flip side shows up when grounding is missing — LLMs look socially competent when one model omnisciently controls every party, then fail the moment agents hold private information the model can't see, revealing that the 'rules' depended on shortcuts that don't exist in the new setting Why do LLMs fail when simulating agents with private information?. So information asymmetry and hidden state are domain properties that break transfer, while document-grounded structure supports it.
There's also a limit on what 'causal rules' can even capture, which bounds what transfers. Causal belief networks model causal reasoning well but can't represent associative, analogical, or emotion-driven shifts Can causal models alone capture how humans actually reason? — so in domains where those other channels dominate, a causal rule was never the full story to begin with. And LLMs inherit human-like causal biases (weak explaining-away, Markov violations) straight from training-data statistics Do large language models make the same causal reasoning mistakes as humans?, meaning a 'rule' that's really a statistical regularity will transfer only to domains with matching statistics, not to ones requiring true causal inference.
The quietly surprising thread: the most reliable way to make rules transfer may be to stop storing them in weights at all. Agents that externalize state, skills, and protocols into a harness layer don't have to re-solve the same problem in each new context Where does agent reliability actually come from?, and memory-based online RL lets an agent adapt continually through case and tool memory without touching parameters Can agents learn continuously from experience without updating weights?. When a rule lives in an inspectable memory rather than entangled weights, transfer to a new agent becomes a matter of handing over the memory — which reframes the whole question from 'does this domain allow transfer' to 'did we put the rule somewhere transferable.'
Sources 11 notes
Agents trained on static expert datasets cannot learn from their own failures or generalize beyond demonstrated scenarios because they never interact with environments during training. Competence is capped by what curators imagined, not by agent capacity.
RLAG rewards both answer accuracy and explanation rationality by cycling between augmented and unaugmented generation, progressively internalizing coherent knowledge structures. This outperforms SFT because it prioritizes reasoning quality over token-level correctness.
Medical AI systems and o3 demonstrate that sophisticated domain reasoning emerges naturally from RL training on difficult problems with only basic accuracy signals, without requiring explicit chain-of-thought distillation from teacher models.
Three faithfulness tests show fine-tuned models generate reasoning chains that less reliably influence final outputs. Early termination, paraphrasing, and filler substitution all produce invariant answers more often after fine-tuning, suggesting reasoning becomes performative rather than functional.
Representational analysis alone identifies correlations without causation; causal analysis alone shows behavioral effects without explaining them. Only paired methods—locating candidate features representationally, then verifying causally—produce complete mechanistic claims.
MAJ-EVAL automatically extracts stakeholder personas from domain documents via semantic clustering and orchestrates structured three-phase debate, achieving reproducible evaluation that transfers across tasks like summarization and dialogue without manual redesign. The approach grounds personas in real stakeholder perspectives rather than arbitrary roles.
Research shows LLMs perform well when one model controls all interlocutors but fail systematically when agents possess private information. This reveals that apparent social competence relies on grounding work that models skip in omniscient settings.
Causal belief networks excel at modeling causal reasoning but cannot represent associative links, analogical mappings, or emotion-driven belief shifts. The GenMinds framework itself acknowledges this as a tractable starting point rather than a complete theory.
LLMs show weak explaining away and Markov violations in collider networks, matching human error patterns exactly. This suggests shared mechanisms rooted in training data statistics rather than categorical reasoning inferiority.
Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.
AgentFly formalizes agent learning as a Memory-augmented MDP with three memory modules (case, subtask, tool) that enable credit assignment and policy improvement entirely through memory operations. The approach achieved 87.88% on GAIA validation without modifying LLM parameters.