INQUIRING LINE

How do fixed pragmatic templates prevent models from understanding context?

This explores why LLMs seem to apply fixed, surface-level pragmatic rules — for things like implicature, presupposition, and unstated background conditions — instead of flexibly reading what a given context actually calls for.


This explores why LLMs apply fixed pragmatic rules rather than reading context flexibly. The clearest case study is scalar implicature — the everyday inference that 'some students passed' usually means 'not all.' Humans dial this inference up or down depending on the stakes of the conversation: whether literal precision is demanded, what's in focus, or whether bluntness would be socially costly. The corpus finds ChatGPT does none of this dialing — it computes the same implicature regardless of communicative context, suggesting it has learned a default template rather than the underlying skill of tracking what a speaker means Can language models adapt implicature to conversational context?. The template fires; the context is ignored.

The same shape shows up in how models handle presupposition. Constructions like 'X stopped doing Y' or non-factive verbs ('claimed,' 'believed') are supposed to flip what's entailed, but models read them as surface cues instead of computing their actual semantic effect — they act as systematic 'blinds' that persist across prompts and models Why do embedding contexts confuse LLM entailment predictions?. More strikingly, when a question smuggles in a false assumption, models tend to play along and accommodate it — even when a direct factual question proves they know the assumption is wrong Why do language models accept false assumptions they know are wrong?. The conversational template ('answer the question as posed') overrides the knowledge the model demonstrably has.

Lateral to this is the frame problem: the things a context leaves unstated. Models struggle not because they lack world knowledge but because they fail to bring relevant background conditions forward as constraints — and when you force them to explicitly enumerate those preconditions, accuracy jumps from 30% to 85% Do language models fail at identifying unstated preconditions?. That gap is the tell. The knowledge is there; the default response pattern just doesn't reach for it unless the prompt scaffolds the reach. Context isn't 'understood' so much as it has to be manually unpacked into the surface text.

There's a deeper mechanism underneath all of this. Models fail to integrate context when their training priors are strong enough to dominate — and textual prompting alone can't override those priors; it takes causal intervention in the representations themselves Why do language models ignore information in their context?. So a 'fixed template' isn't a stylistic quirk; it's the parametric default winning out over the in-context signal. A related framing calls this context collapse: when a query is underspecified, the model falls back to blended training-data priors rather than the specific situation in front of it Why do large language models produce generic responses to vague queries?.

What makes this worth knowing: the failure isn't ignorance, it's a disconnect between knowing and applying. The same incoherence appears in 'potemkin understanding,' where a model explains a concept correctly, fails to apply it, and can even recognize the failure — a pattern that points to functionally separated explanation and execution pathways Can LLMs understand concepts they cannot apply?. Pragmatic competence requires tracking communicative stakes in real time. A template gives you the average answer for the average context — which is exactly why it looks fluent and still misses the room.


Sources 7 notes

Can language models adapt implicature to conversational context?

ChatGPT shows no context-sensitivity in computing scalar implicatures across three dimensions: explicit literal-mode instructions, information structure focus, and face-threatening contexts. Humans flexibly modulate these inferences; the model does not, suggesting pragmatic competence requires tracking communicative stakes that LLMs systematically miss.

Why do embedding contexts confuse LLM entailment predictions?

LLMs treat presupposition triggers and non-factive verbs as surface cues rather than computing their opposite semantic effects on entailments. This structural failure persists across prompts and models, suggesting models rely on surface patterns instead of structural analysis.

Why do language models accept false assumptions they know are wrong?

The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.

Do language models fail at identifying unstated preconditions?

LLMs struggle not from lacking world knowledge but from failing to bring background conditions forward as relevant constraints. Prompting that forces explicit enumeration of preconditions raises accuracy from 30% to 85%, revealing the frame problem persists in statistical systems.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Why do large language models produce generic responses to vague queries?

Unlike social-media context collapse, which flattens multiple audiences, LLM collapse occurs when users provide insufficient contextual scaffolding and models default to blended training-data priors. This distinction suggests remedies should focus on query verification and user-driven context specification rather than platform controls.

Can LLMs understand concepts they cannot apply?

Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.

Next inquiring lines