How should headers index procedural intent differently from keyword chunking?

This explores how a document's headers should capture the *purpose* behind a procedure—what a step is for and what it depends on—rather than serving as keyword-matched cut points the way fixed-size chunking does.

This explores how headers should index *procedural intent*—what a step accomplishes and what it presupposes—rather than acting as keyword landmarks the way chunk-based retrieval does. The corpus is unusually direct on this: fixed-size chunking optimizes for surface keyword overlap, which systematically severs the dependencies that make a how-to procedure work. The most pointed answer comes from THREAD's four-part logic units How do logic units preserve procedural coherence better than chunks?, where a header is no longer just a topic label but one slot in a structure—prerequisite, header, body, linker. The header names the step, the prerequisite encodes what must already be true, and the linker explicitly routes to the next step or branch. That's the difference in a nutshell: keyword chunking asks "which passage looks similar to the query," while an intent-indexed header asks "what does this step do, and what does it require to fire."

The deeper reason this matters shows up when you look at how procedures get *indexed* elsewhere in the collection. PRAXIS finds that indexing web-agent procedures by environment *state* and local action pairs beats higher-level workflow abstractions, precisely because the click-by-click specifics—the conditions under which an action is valid—are what carry the reliability Does state-indexed memory outperform high-level workflow memory for web agents?. State is the runtime cousin of a prerequisite: both say "this only applies here." Keyword chunking erases that conditionality; intent-aware headers preserve it.

There's a useful tension worth surfacing, though. Agent Workflow Memory argues for indexing at the *sub-task routine* level—abstracting away example-specific values so a step can be reused across contexts, yielding large gains as train-test gaps widen Can agents learn reusable sub-task routines from past experience?. PRAXIS pushes the opposite way, toward concrete state. So "procedural intent" isn't one altitude: a good header has to name the reusable purpose while still pointing at the conditions that bind it. The art is holding both.

A cross-domain framing sharpens why headers-as-intent even works: LLM Programs treat a complex task as explicit algorithmic control flow that hands each model call only its step-relevant context, hiding everything else Can algorithms control LLM reasoning better than LLMs alone?. A header that indexes intent is doing the same information-hiding job at retrieval time—it lets the system pull *just* the step that matters and the link to the next, instead of dumping a similarity-ranked soup of passages. Rasa's reframing of dialogue understanding as generating commands rather than classifying intents makes a parallel move Can command generation replace intent classification in dialogue systems?: it treats meaning as pragmatics (what action is wanted) rather than semantics (what words match), which is exactly the shift from keyword chunking to intent indexing.

The thing you might not have expected to learn: across these notes, the real failure of keyword chunking isn't that it retrieves the wrong topic—it usually gets the topic right. It's that it loses the *edges*—the prerequisites, the state conditions, the links between steps—and procedures live entirely in their edges. An intent-indexing header is really a way of storing the graph, not just the nodes.

Sources 5 notes

How do logic units preserve procedural coherence better than chunks?

THREAD replaces chunks with four-part logic units—prerequisite, header, body, linker—enabling dynamic multi-step retrieval for how-to questions. Linkers explicitly navigate between steps and branches, addressing both the semantic-vs-task-relevance gap in embeddings and the sequential dependency loss in chunk-based RAG.

Does state-indexed memory outperform high-level workflow memory for web agents?

PRAXIS shows that indexing procedures by environment state and local action pairs yields consistent accuracy and reliability gains across VLM backbones on the REAL benchmark, compared to higher-level workflow abstractions that lose click-by-click specifics.

Can agents learn reusable sub-task routines from past experience?

Agent Workflow Memory induces sub-task routines at finer granularity than full tasks, abstracts example-specific values, and compounds them hierarchically. This produces 24.6% relative gain on Mind2Web and 51.1% on WebArena, with larger gains as train-test gaps widen.

Can algorithms control LLM reasoning better than LLMs alone?

LLM Programs embed LLMs within explicit algorithms that manage control flow and state, presenting only step-specific context to each LLM call. This information hiding addresses capability and context window limits while treating complex reasoning as modular, debuggable sub-tasks.

Can command generation replace intent classification in dialogue systems?

Rasa's dialogue understanding architecture generates domain-specific commands instead of classifying intents, eliminating annotation requirements, handling context naturally, and scaling without degradation—treating understanding as pragmatics rather than semantics.

How should headers index procedural intent differently from keyword chunking?

Sources 5 notes

Next inquiring lines