Does teaching question patterns before document training improve knowledge access?

Standard LLM training encodes documents first, then teaches QA patterns. But does this order matter? Exploring whether reversing the sequence—teaching how knowledge gets queried before encoding it—could unlock better factual recall.

Synthesis note · 2026-06-03 · sourced from Training Fine Tuning

To keep an assistant current, the standard recipe is continued pretraining on new documents followed by instruction-tuning on QA pairs. The paper finds this fails: LLMs trained this way struggle to answer questions even when the perplexity of the documents is minimized — the knowledge is encoded but not accessible. The diagnosis is a granularity mismatch: QA pairs are simple and direct, while documents weave many facts together intricately, so encoding document knowledge without knowing how it will be queried produces representations that don't surface under questioning.

The fix inverts the order. Pre-instruction-tuning (PIT) instruction-tunes on questions before continued pretraining on documents, so the model learns how knowledge is accessed before it encodes the knowledge — and the encoding then takes the access pattern into account. PIT outperforms standard instruction-tuning for later factual recall.

The keeper is a principle about knowledge encoding: what the model learns from a document depends on whether it already knows how that knowledge will be retrieved — encoding and access are coupled, not sequential. This connects to Can we predict keyword priming before learning happens? (how new facts get recruited) and to Does procedural knowledge drive reasoning more than factual retrieval?: both, with PIT, point to how knowledge is represented for use mattering more than raw exposure.

Inquiring lines that use this note as a source 1

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Can document repetition accidentally memorize sensitive information instead of learning?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

15 direct connections · 116 in 2-hop network ·medium cluster Open in graph ↗

Does teaching question patterns before document … Can we predict keyword priming before learning hap… Does procedural knowledge drive reasoning more tha… Does repeated sensitive data in fine-tuning cause …

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can we predict keyword priming before learning happens? Exploring whether the degree to which newly learned keywords contaminate unrelated contexts can be predicted from measurable properties before training begins, and what mechanisms enable this prediction.
both concern how newly-learned facts become accessible, not just stored
Does procedural knowledge drive reasoning more than factual retrieval? Explores whether models learn reasoning through general procedures across diverse documents rather than memorizing specific facts. This matters for understanding what pretraining data actually teaches models to reason.
both relocate value from raw document exposure to how knowledge is represented for use
Does repeated sensitive data in fine-tuning cause memorization? When language models train on the same private or proprietary data multiple times, how much do they end up memorizing and leaking that information at inference time? Understanding this risk is critical for organizations fine-tuning on confidential datasets.
the flip side: document-encoding by repetition memorizes; PIT changes what encoding captures

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

pre-instruction-tuning on QA pairs before training on documents improves knowledge acquisition by encoding how knowledge will be accessed first

Does teaching question patterns before document training improve knowledge access?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4