Can explicit linkers replace vector similarity for multi-step question answering?
This explores whether structured connections — symbolic rules, knowledge-graph edges, hypergraph links — can do the work that embedding-based vector similarity does (or fails to do) when a question needs several reasoning hops to answer.
This explores whether explicit links between facts can stand in for vector similarity on questions that require chaining several pieces of evidence together. The corpus has a clear and somewhat opinionated answer: vector similarity is the wrong tool for multi-step QA, and explicit linking structures consistently beat it — but they complement rather than fully replace it.
The case against similarity starts with what embeddings actually measure. They encode co-occurrence, so they score concepts that are *semantically close but role-distinct* as highly relevant — which works in demos but breaks down where an underspecified, multi-hop query has many wrong-but-associated candidates Do vector embeddings actually measure task relevance?. This is the same crack that shows up when long-context models try to absorb retrieval entirely: they match similarity-based RAG on semantic lookup, but collapse on relational queries that require joins across structured facts — context length alone can't bridge the gap Can long-context LLMs replace retrieval-augmented generation systems?. Multi-step QA is exactly the relational-join case, not the semantic-lookup case.
Explicit linkers attack the problem from the other side. SymAgent derives symbolic rules from a knowledge graph's structure and uses them as *navigational plans* — aligning the natural-language question to the graph's topology, and outperforming methods that lean on semantic similarity alone Can symbolic rules from knowledge graphs guide complex reasoning?. Hypergraph memory pushes further: instead of flat retrieved lists or binary edges, it binds three-or-more entities into a single hyperedge, preserving joint constraints across retrieval steps so coherent knowledge accumulates rather than fragmenting at each hop Can hypergraphs capture multi-hop reasoning better than graphs?. Both encode the *relationships* a multi-step answer depends on, which similarity scores throw away.
There's a deeper reason this matters. LLMs themselves reason through semantic association, not symbolic logic — when meaning is stripped from a task, performance collapses even when the correct rules sit right there in context Do large language models reason symbolically or semantically?. So the model can't be trusted to silently reconstruct the link structure; the structure has to be made explicit and external. That's also why textual prompting alone often fails to override a model's strong priors during multi-hop integration Why do language models ignore information in their context?.
But 'replace' is too strong. The more sophisticated framing in the corpus is *routing*, not substitution: StructRAG selects the knowledge-structure type — table, graph, algorithm, chunk — based on what the query demands, rather than forcing every question through one retrieval mode Can routing queries to task-matched structures improve RAG reasoning?. Some sub-steps still want plain semantic retrieval; the relational hops want explicit links. Pairing this with architectures that separate query-planning from answer-synthesis — which already improves multi-hop performance on its own Do hierarchical retrieval architectures outperform flat ones on complex queries? — suggests the real answer isn't 'linkers instead of vectors' but a planner that knows when to navigate explicit structure and when similarity is good enough.
Sources 8 notes
Embeddings encode co-occurrence patterns, making semantically close but role-distinct concepts highly similar. This works in simple demos but fails in production where underspecified queries have many wrong-but-associated candidates.
The LOFT benchmark shows LCLMs match RAG on semantic retrieval without explicit training, but cannot execute relational queries requiring joins across structured tables. Context length alone cannot bridge this gap.
SymAgent derives symbolic rules from KG structure using LLM reasoning to create navigational plans that align natural language with graph topology. This approach captures structural reasoning patterns explicitly, outperforming retrieval methods that rely on semantic similarity alone.
HGMem organizes retrieved evidence as hyperedges rather than flat lists or binary graphs, allowing three or more entities to bind into single relations without decomposition. This structure accumulates coherent knowledge across retrieval steps, trading representational complexity for constraint expressiveness.
When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.
Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.
StructRAG demonstrates that selecting knowledge structure type based on query demands—via DPO-trained router choosing among tables, graphs, algorithms, catalogues, and chunks—improves knowledge-intensive reasoning over standard retrieval. The approach grounds this in cognitive load and cognitive fit theory from cognitive science.
Separating query planning from answer synthesis into distinct components reduces interference and improves multi-hop query performance. This architectural principle mirrors documented benefits of separating planning from execution in agent design.