Can single-hop knowledge automatically compose into multi-hop capability?

This explores whether a model that has memorized lots of individual facts (A→B, B→C) will spontaneously chain them into multi-step inferences (A→C), or whether composition has to be built in deliberately — and the corpus answers from two angles: what happens inside the model, and what we engineer around it.

This explores whether knowing single facts in isolation automatically gives you the ability to chain them — and the most direct evidence in the collection says no, not without help. A controlled training study of how transformers actually learn to reason across steps found that multi-hop ability emerges in three distinct phases — first rote memorization, then generalizing within familiar territory, then reasoning across unfamiliar combinations — and crucially, that the second hop only generalizes when the model is explicitly exposed to compositional examples during training How do transformers learn to reason across multiple steps?. Memorizing A→B and B→C separately does not reliably produce A→C on its own; the composition has to be taught. The same study found a tell-tale signature — successful reasoning shows up as entity representations clustering together by cosine similarity — suggesting composition is a learned geometric reorganization, not a free byproduct of storing facts.

That finding reframes a lot of the retrieval work in the corpus as essentially an end-run around the model's reluctance to compose internally. If you can't count on the weights to chain hops, you build the chaining into the surrounding structure. HippoRAG turns a corpus into a knowledge graph and uses Personalized PageRank seeded from the query to traverse multi-hop paths in a single retrieval step — matching iterative approaches at a fraction of the cost, precisely because the graph encodes the connections the model won't make on its own Can knowledge graphs enable multi-hop reasoning in one retrieval step?. The composition lives in the graph topology, not the model's reasoning.

But a flat graph only captures pairwise links, and many real inferences bind three or more entities under a single joint constraint. Hypergraph memory addresses exactly this gap: instead of decomposing a relation into binary edges, it stores evidence as hyperedges that hold several entities together, so multi-step constraints accumulate coherently across retrieval steps rather than getting flattened and lost Can hypergraphs capture multi-hop reasoning better than graphs?. Here the representational structure itself is doing the composing. Other work pushes the chaining into the control flow rather than the data structure — separating query planning from answer synthesis so each hop gets reasoned about cleanly without interference Do hierarchical retrieval architectures outperform flat ones on complex queries?, or learning a traversal policy with tree search and reinforcement learning so the model navigates a graph step by step instead of swallowing it whole Can learned traversal policies beat exhaustive graph reading?.

The interesting tension across these notes is *where* composition has to be paid for. The transformer-emergence work says you pay for it during training, with explicit compositional examples, or you don't get robust multi-hop at all. The retrieval and graph work says you can instead pay for it at inference, by externalizing the connections into a graph, a hypergraph, a planner, or a learned navigation policy. There's even a parallel in the model-internals direction: composing task-specific expert vectors at inference time also turns out to require deliberate machinery to mix without interference rather than emerging for free Can models dynamically activate expert skills at inference time?. The through-line worth taking away: composition is consistently something the corpus treats as engineered — taught, graph-encoded, or policy-learned — and almost never as something that falls out automatically from having the pieces.

Sources 6 notes

How do transformers learn to reason across multiple steps?

Controlled training reveals transformers learn multi-hop reasoning in three phases: memorization, in-distribution generalization, and cross-distribution reasoning. Successful reasoning correlates with cosine clustering of entity representations, and second-hop generalization requires explicit compositional exposure during training.

Can knowledge graphs enable multi-hop reasoning in one retrieval step?

HippoRAG converts corpus into a knowledge graph, then uses Personalized PageRank seeded from query concepts to traverse multi-hop paths in one step. It matches iterative retrieval while being 10-20x cheaper and 6-13x faster, with 20% better accuracy on multi-hop QA.

Can hypergraphs capture multi-hop reasoning better than graphs?

HGMem organizes retrieved evidence as hyperedges rather than flat lists or binary graphs, allowing three or more entities to bind into single relations without decomposition. This structure accumulates coherent knowledge across retrieval steps, trading representational complexity for constraint expressiveness.

Do hierarchical retrieval architectures outperform flat ones on complex queries?

Separating query planning from answer synthesis into distinct components reduces interference and improves multi-hop query performance. This architectural principle mirrors documented benefits of separating planning from execution in agent design.

Can learned traversal policies beat exhaustive graph reading?

Graph-O1 replaces whole-graph ingestion with step-by-step agentic navigation using Monte Carlo Tree Search and reinforcement learning. This approach fits within LLM context windows while learning domain-specific traversal policies, though it trades certainty about the full graph for decision-making under uncertainty.

Can models dynamically activate expert skills at inference time?

Transformer2 demonstrates that tuning only singular values within weight matrices produces composable expert vectors that dynamically mix at inference without interference, outperforming LoRA with fewer parameters and enabling continual specialization.

Can single-hop knowledge automatically compose into multi-hop capability?

Sources 6 notes

Next inquiring lines