Can training LLMs to form ad-hoc conventions improve their pragmatic reasoning?

This explores whether teaching LLMs to coordinate on shared, throwaway conventions — the way two people in a conversation quietly agree to call something "the blue one" and then reuse that shorthand — would sharpen their grasp of meaning-in-context (pragmatics), and the corpus has no paper on this exact idea but several that triangulate it.

This explores whether teaching LLMs to form ad-hoc conventions — the on-the-fly shared agreements that let conversation partners coordinate meaning — could improve pragmatic reasoning. No note in the collection studies ad-hoc convention formation head-on, so the honest answer is that the corpus circles the question rather than settling it. But the threads it does have point in a consistent and encouraging direction: the social-coordination skills that conventions depend on appear to be trainable, even though they don't emerge for free.

The most direct evidence is that frontier models are surprisingly bad at exactly the kind of joint coordination conventions require. Models that solve problems alone collapse when made to collaborate, drifting into >90% agreement regardless of who is right Why do language models fail at collaborative reasoning?. That matters here because forming a convention is the opposite of reflexive agreement — it's two parties negotiating a shared term and then holding each other to it. The encouraging part: self-play preference training recovered 16.7% of the lost performance, which is concrete evidence that social skills like productive disagreement *can* be trained rather than waited for. If disagreement is trainable, ad-hoc agreement plausibly is too.

A second thread suggests why convention-forming should help reasoning specifically. LLMs lean on semantic association rather than symbolic manipulation — strip the familiar surface meaning from a task and performance collapses even when the rules are right there in context Do large language models reason symbolically or semantically?. A convention is essentially a freshly-minted symbol, a token whose meaning is set by the conversation rather than by training-distribution semantics. Getting models to honor those would push directly against this weakness. The same fault line shows up as Potemkin understanding, where models explain a concept correctly but fail to apply it — explanation and execution running on disconnected tracks Can LLMs understand concepts they cannot apply?. Conventions are a tight test of that gap: you can only honor a shared agreement if you actually apply it, not just describe it.

There's also reason for caution baked into how these models generate. Token prediction trains them to flow smoothly toward the training distribution rather than to track and reuse a locally-invented rule Does LLM generation explore competing claims while producing text?. An ad-hoc convention is by definition off-distribution — it's the bespoke thing this conversation made up — so the default generative pull works against maintaining it. That tension is what training would have to overcome, and it's why prompting alone might not be enough.

What would success look like, and is the training signal available? Two adjacent results say yes. Strategic reasoning work shows models already run distinct profiles like trust-based and belief-anticipation reasoning in coordination games — the raw ingredients of "what is my partner likely to mean" Do large language models use one reasoning style or many?. And models fine-tuned on psychology-experiment data become general predictors of human decisions, capturing individual differences and transferring across tasks Can language models learn to model human decision making? — exactly the human-coordination data a convention-forming objective would draw on. Put together, the corpus's implicit answer is: pragmatic, convention-honoring behavior doesn't arise on its own, but every prerequisite — trainable social skill, latent strategic reasoning, and human-coordination data — is already on the table.

Sources 6 notes

Why do language models fail at collaborative reasoning?

Frontier LLMs that solve problems alone fail when collaborating, achieving >90% agreement regardless of correctness. Self-play preference training improves outcomes by 16.7%, suggesting social skills for effective disagreement can be trained.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Can LLMs understand concepts they cannot apply?

Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.

Does LLM generation explore competing claims while producing text?

Token prediction trains models to continue toward the training distribution, not to explore logically related counterpositions. This smoothness in process produces smooth claims that multiply without generating new perspectives.

Do large language models use one reasoning style or many?

Analysis of 22 LLMs across behavioral game theory reveals three dominant profiles: GPT-o1 uses minimax reasoning, DeepSeek-R1 uses trust-based reasoning, and GPT-o3-mini uses belief-anticipation. Performance correlates with game structure, not raw reasoning depth.

Can language models learn to model human decision making?

LLMs finetuned on psychology experiment data predict human behavior more accurately than theory-driven models in decision tasks, capture individual differences in their embeddings, and transfer learning across tasks without task-specific design.

Can training LLMs to form ad-hoc conventions improve their pragmatic reasoning?

Sources 6 notes

Next inquiring lines