How do logic units preserve procedural coherence better than chunks?
Can structured retrieval units with prerequisites, headers, bodies, and linkers maintain step-by-step coherence in how-to answers where fixed-size chunks fail? This matters because procedural questions require sequential logic and conditional branching that chunk-based RAG cannot support.
RAG systems overwhelmingly use fixed-size chunks as their retrieval granularity. This works acceptably for factoid "5W" questions (who, what, where, when, why) where the answer is localized. It fails systematically for "1H" questions — how-to questions — which require sequential, procedurally coherent answers where step ordering, prerequisites, and conditional branching matter.
THREAD proposes logic units (LUs) as an alternative retrieval granularity with four components:
- Prerequisite: information needed to understand the LU — domain terminology, abbreviations, constraints that must be met. Functions both as context supplement (preventing hallucination from decontextualized chunks) and as filter (excluding irrelevant LUs based on unmet constraints).
- Header: summary or intent description, used for indexing. Unlike chunks that index the entire content, headers enable intent-based retrieval — matching queries to the purpose of the LU rather than its surface content.
- Body: detailed content — specific actions, code blocks, instructions. The core material fed to the LLM generator.
- Linker: bridge to subsequent logic units. Specifies what comes next — multiple possibilities after taking an action, guiding retrieval of the next-step LU. This is the critical innovation: it enables dynamic, multi-step answer construction where each step's outcome determines the next retrieval.
The linker is what makes THREAD fundamentally different from chunk-based RAG. Chunks have no mechanism for specifying what should come next — retrieval of subsequent chunks relies on the same query or the generated partial answer, both of which degrade as the procedure progresses. Linkers provide explicit navigation between steps, enabling branching paths (if server load is high → do X; if normal → do Y).
This connects to the broader RAG failure mode. Since Do vector embeddings actually measure task relevance?, the chunk+embedding approach fails for procedural questions doubly: embeddings can't capture sequential dependency, and chunks can't preserve it. Logic units address both by structuring retrieval around intent (header) and navigation (linker) rather than semantic similarity.
Source: Question Answer Search
Related concepts in this collection
-
Do vector embeddings actually measure task relevance?
Vector embeddings rank semantic similarity, but RAG systems need topical relevance. When these diverge—as with king/queen versus king/ruler—does similarity-based retrieval fail in production?
logic units address the task-relevance gap by indexing on intent (headers) rather than semantic similarity
-
What do enterprise RAG systems need beyond accuracy?
Academic RAG benchmarks focus on question-answering accuracy, but enterprise deployments in regulated industries face five distinct requirements—compliance, security, scalability, integration, and domain expertise—that standard architectures don't address.
logic units address the coherence and reliability requirements that enterprise RAG needs
-
When do graph databases outperform vector embeddings for retrieval?
Vector similarity struggles with aggregate and relational queries that require traversing multiple entity connections. Can graph-oriented databases with deterministic queries solve this failure mode in enterprise domain applications?
linkers in logic units implement a lightweight form of relational traversal within the document structure
-
Does question type determine the right retrieval strategy?
Explores whether different non-factoid question types require distinct retrieval and decomposition approaches. Matters because standard RAG fails when applied uniformly to debate, comparison, and experience questions despite being effective for factoid queries.
how-to questions are a specific NFQ type requiring procedural coherence that logic units provide
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
logic units with prerequisite-header-body-linker structure preserve document coherence that fixed-size chunking destroys for procedural how-to questions