How do description-based identifiers bias language model output distribution?
This explores how giving a model identifiers that carry descriptive meaning (rather than opaque, neutral labels) tilts what it generates — the corpus doesn't have a paper using this exact phrase, but it has a lot on why semantically loaded inputs pull output toward the model's priors.
This reads as a question about a subtle design choice: when you label something with a *description* — a name that means something — instead of a neutral ID, you hand the model a semantic hook, and that hook biases what it produces. The collection doesn't have a note using the term "description-based identifiers" directly, but it has a surprisingly deep bench on the underlying mechanism: descriptive labels activate the model's pre-existing associations, and those associations compete with — and often beat — whatever you actually want.
The sharpest version of this is the finding that models fail to use the information in front of them when their training priors are strong Why do language models ignore information in their context?. A description-based identifier is essentially a prior-trigger: the moment a label carries meaning, the model's parametric associations with that meaning fire, and textual prompting alone can't override them. The same ceiling shows up in prompt optimization — you can reorganize and surface what a model already knows, but you can't inject anything new through clever wording Can prompt optimization teach models knowledge they lack?. A descriptive identifier, then, only ever activates regions of the existing distribution; it can't make the model treat the label as a blank slate.
Why does that bias the *distribution* specifically? Because the model is an autoregressive probability machine, and descriptive cues steer it toward high-probability completions even when the task wants something rare Can we predict where language models will fail?. Worse, descriptive inputs can trip template-matching: the model recognizes a label as "like" something it has seen and emits a plausible memorized pattern rather than reasoning from the specifics Do large language models actually perform iterative optimization?. So the bias isn't just a nudge — it can swap genuine computation for pattern recall.
The direction of the bias matters too, and it isn't neutral. When a description points at something underrepresented in training, the model routes it through dominant proxies — low-resource cultures get internally represented through high-resource ones, even when the surface answer looks fine Do LLMs represent low-resource cultures through dominant cultural proxies?. A descriptive identifier inherits whatever skew the training distribution had for that description. And because the model holds a superposition and samples from it at generation time rather than committing Do large language models actually commit to a single character?, a descriptive label is best understood as *selecting a slice of that distribution* — it doesn't pin down a single answer, it reweights which answers are likely.
The thing you might not have expected: the safest-seeming move — naming things meaningfully so they're human-readable — is exactly what surrenders control of the output distribution to the model's priors. Opaque identifiers carry no associations to activate; descriptive ones carry all of them. If you want to go further on how that knowledge is stored as flowing activation rather than retrievable fact, the residual-stream note is the doorway Do transformer models store knowledge or generate it continuously?.
Sources 7 notes
Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.
Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.
By framing LLMs as autoregressive probability machines, researchers predicted tasks with low-probability target responses would be systematically harder, even when logically simple. Experiments confirmed predictions like backwards alphabet and letter counting.
Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.
Mechanistic interpretability analysis reveals that low-resource cultures like Ethiopia and Algeria are structurally represented through high-resource cultural proxies in internal model states, not just output. This architectural bias persists even when models can produce correct surface-level answers.
Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.
Transformers organize knowledge as flowing activations rather than retrievable archives, mirroring oral cultures where knowledge exists only in performance. This explains why model knowledge is contextual, difficult to edit, and inseparable from generation.