INQUIRING LINE

Can graph topology represent successful trajectory clusters more effectively than skill libraries?

This explores a head-to-head: when an agent succeeds, is it better to capture *why* by encoding runs as graph structure (paths, trees, edges) or by distilling them into a reusable library of skills — and the corpus turns out to suggest these are complementary representations rather than rivals.


This reads the question as a contest between two ways of remembering what worked: keep the *shape* of successful runs (graph topology — which steps branched where, which paths converged) or boil them down into named, reusable *skills*. The corpus doesn't stage that fight directly, but it gives you both camps and a surprising reason they may not be opponents at all.

The strongest skill-library evidence comes from Should successful and failed episodes be processed differently?, where SkillRL treats successes and failures *asymmetrically* — successful episodes become concrete demonstrations, failures become abstracted lessons — and beats uniform consolidation while using far less context. The key word is *abstraction*: a skill library compresses a cluster of successful trajectories into something portable, deliberately throwing away structural detail to save memory. That's its strength and its bet.

The graph-topology camp argues the structure you'd throw away is exactly the reward signal. Can trajectory structure replace hand-annotated process rewards? shows that tree topology, expert-aligned actions, and tool-call positions can *substitute* for separately trained process rewards — the shape of the trajectory tells you which steps were good without anyone labeling them. Can tree search replace human feedback in LLM training? makes the same move: tree search 'naturally ranks solution paths by success,' so the branching structure itself is the cluster of what-worked. And Can reasoning topologies be formally classified as graph types? insists this isn't a metaphor — CoT, ToT, and GoT are literally path graphs, trees, and directed graphs, and a graph's in-degree>1 buys you divide-and-conquer synthesis that a linear skill record can't express. So topology doesn't just store successes; it stores *relationships between* successes a flat library would flatten away.

Here's the thing you didn't know you wanted to know: the two representations may need each other, and there's a reason rooted in *retention*. Why do trajectories matter more than individual examples for in-context learning? finds that in-context learning of decision-making requires whole trajectories from the same environment, not isolated examples — pure skill abstraction (isolated lessons) can break the very generalization you wanted. Meanwhile Why do reasoning systems keep discovering new connections? shows graph-structured reasoning keeps ~12% of edges 'semantically surprising' — it never fully settles, which is great for discovery but bad if you want a stable, retrievable skill. A library converges; a graph keeps churning. That tension is the real answer to 'more effectively' — it depends on whether you're optimizing for stable reuse or for continued discovery.

If you want to go further, Can learned traversal policies beat exhaustive graph reading? is the pragmatic middle path: rather than store the whole success graph *or* a thin skill list, it learns a *traversal policy* over the graph — keeping topology but navigating it selectively to fit context limits. And Can knowledge graphs teach models deep domain expertise? shows the graph-first bet paying off elsewhere: 24,000 tasks built from medical knowledge-graph paths beat scale, because structured composition retained relationships a skill list would have severed. The honest synthesis: topology wins when the *relationships between* successful steps carry the signal; skill libraries win when you need cheap, stable, portable reuse — and the live research is mostly about learning to traverse the first to produce the second.


Sources 8 notes

Should successful and failed episodes be processed differently?

SkillRL demonstrates that treating successful episodes as concrete demonstrations and failures as abstracted lessons achieves state-of-the-art performance on complex tasks while using substantially less context than uniform approaches. The asymmetry mirrors human expert reasoning and avoids the degradation seen in uniform consolidation methods.

Can trajectory structure replace hand-annotated process rewards?

Tree-GRPO, Supervised RL, and ToolPO each convert sparse outcome rewards into dense step signals by exploiting different structural features—tree topology, expert-aligned actions, and tool-call positions—eliminating the need for annotated process reward models.

Can tree search replace human feedback in LLM training?

AlphaLLM uses tree search outcomes and three critic models to derive dense reward signals equivalent to human-labeled feedback. Tree structure naturally ranks solution paths by success, replacing the annotation oracle that standard RLHF requires.

Can reasoning topologies be formally classified as graph types?

CoT, ToT, and GoT map precisely to path graphs, trees, and arbitrary directed graphs respectively. The topology is not metaphorical but defines actual computational structure—GoT's in-degree > 1 enables divide-and-conquer synthesis that trees cannot express.

Why do trajectories matter more than individual examples for in-context learning?

In-context learning for sequential decision-making requires full or partial trajectories from the same environment level, not just isolated examples. This structural property—trajectory burstiness—allows models to generalize across vastly different tasks without weight updates.

Why do reasoning systems keep discovering new connections?

Analysis shows iterative graph reasoning evolves toward a stable phase where semantic entropy persistently dominates structural entropy, with ~12% of edges remaining semantically surprising despite structural connection, fueling ongoing discovery.

Can learned traversal policies beat exhaustive graph reading?

Graph-O1 replaces whole-graph ingestion with step-by-step agentic navigation using Monte Carlo Tree Search and reinforcement learning. This approach fits within LLM context windows while learning domain-specific traversal policies, though it trades certainty about the full graph for decision-making under uncertainty.

Can knowledge graphs teach models deep domain expertise?

Fine-tuning a 32B model on 24,000 reasoning tasks derived from medical knowledge graph paths produces state-of-the-art performance across 15 medical domains, demonstrating that structured knowledge composition matters more than scale.

Next inquiring lines