INQUIRING LINE

Can targeted post-training teach AI systems to form ad-hoc linguistic conventions?

This explores whether post-training methods (like DPO or reward shaping) can teach AI to do the live, on-the-fly thing humans do in conversation — agree on shared words and references as they go — rather than just retrieving fixed patterns.


This explores whether targeted post-training can teach AI the live convention-forming that humans do mid-conversation — settling on shared words and references on the fly. The corpus has a direct and encouraging answer at its center, but it gets more interesting when you read it against the notes that say why this is hard. The clearest 'yes' is lexical entrainment: humans converge on each other's vocabulary as a dialogue unfolds (call it a 'screen' once and you'll both keep saying 'screen,' not 'monitor'), and current models mostly don't do this. The finding in Why don't conversational AI systems mirror their users' word choices? is that DPO trained on coreference-identified preferences can in fact teach models in-context convention formation — exactly the ad-hoc, this-conversation-only agreement the question asks about. So the headline is: yes, a narrow post-training signal can install this behavior.

What makes that finding land is *why* the behavior is missing by default. Two notes converge on the same culprit from different angles: standard training rewards information prediction, not relational work. Why don't language models develop conversation maintenance skills? argues that the implicit moves that keep talk flowing — repairing a confused reference, handing off a topic — are social actions, and models don't acquire them because the training signal never values them. Why do language models respond passively instead of asking clarifying questions? makes the parallel case that next-turn reward optimization trains models to be passively helpful instead of actively negotiating meaning across turns — and that multi-turn-aware rewards fix it. Read together, these say the obstacle isn't capability, it's incentive: change what the post-training objective rewards (relational convergence, long-horizon interaction value) and the conventional behavior appears.

There's a deeper, less obvious move here too: convention-formation can be *grown* rather than imitated. Can language models learn skills without human supervision? shows a self-play loop where two models co-evolve natural-language skills against a neutral judge with no human labels — a system literally manufacturing its own shared conventions through repeated interaction. That's a different mechanism from DPO-on-human-preferences, but it answers the same question: an ad-hoc convention is something a training process can produce, not just copy.

Now the skeptical edge, which is where the reader earns something they didn't expect. Whether what's formed is a *convention* in the full human sense is contested in this corpus. Does AI generate genuine utterances or just text patterns? argues AI output is 'event-residue' that users animate into a pseudo-exchange — the convention may live only on the human side of the table. And Can prompt optimization teach models knowledge they lack? draws the sharp line post-training has to respect: prompting only reorganizes what's already in the distribution. The tension is productive — entrainment is a *behavioral* skill (how to deploy existing language), not new knowledge, which is precisely why a targeted post-training nudge can install it where prompt tricks can't.

The synthesis: yes, targeted post-training can teach ad-hoc linguistic conventions — and there are at least three routes (preference-based DPO, multi-turn-aware reward shaping, and unsupervised self-play). The catch the corpus keeps surfacing is that you have to deliberately reward the *relational* side of language, because default training optimizes information, not the shared agreements that make conversation feel mutual.


Sources 6 notes

Why don't conversational AI systems mirror their users' word choices?

Response generation models fail to adapt vocabulary toward users' lexical choices, a phenomenon central to human rapport and clarity. Post-training via DPO on coreference-identified preferences can teach models in-context convention formation.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Can language models learn skills without human supervision?

Ctx2Skill's three-role self-play loop manufactures missing feedback through internal signals: the Challenger escalates difficulty as curriculum, the Judge gives binary verdicts as reward, and both sides evolve via natural-language skill edits. Success requires balancing adversarial pressure against a generalization safeguard to prevent collapse.

Does AI generate genuine utterances or just text patterns?

AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Next inquiring lines