Conversational AI Systems Reasoning and Knowledge Reasoning and Learning Architectures

Can models learn to ask clarifying questions without explicit training?

Do language models trained only on fully-specified problems spontaneously develop the ability to ask for missing information when facing underspecified tasks? This tests whether conversational problem-solving strategies emerge from meta-learning rather than direct instruction.

Note · 2026-05-18 · sourced from Training Fine Tuning

A surprising generalization result from the social meta-learning training paradigm. The training procedure uses only fully-specified problems — the student receives the complete problem statement from the first turn, and the teacher provides feedback during attempts to solve it. None of the training problems require the student to handle missing information. Yet the trained model performs significantly better on underspecified tasks at test time, where critical information is revealed only across multiple conversational turns.

The behavioral signature is specific: SML-trained models make fewer premature answer attempts and are more likely to ask for the information they need. They learn to recognize when they lack enough information to answer well and to extract that information from the conversation partner. This is the human pattern of "ask before answering when you're not sure" — emerging in an LLM that was never explicitly trained on the pattern.

The mechanism appears to be that SML training teaches the model a meta-strategy: use the conversation as a resource. This strategy generalizes from "use the conversation to refine an answer to a fully-specified problem" (training distribution) to "use the conversation to get missing information first, then answer" (test distribution). The student has learned not just to solicit corrective feedback but to model the conversation as a place where information flows.

The result can be sharpened with a two-stage training procedure called Q-priming. A preliminary SFT stage trains the model on dialogues where it has been explicitly prompted to ask questions, leveraging the teacher's private knowledge to generate good question examples. After Q-priming, online RL via SML refines the behavior further. The combined pipeline produces stronger clarifying-question behavior than either alone.

For conversational AI design, this is an existence proof: the structural skill of "ask before answering" can be installed via training rather than via runtime prompting. Systems that have struggled with the "LLM answers prematurely" failure mode can address it at the training level rather than relying on prompt engineering.

Related concepts in this collection

Concept map
13 direct connections · 114 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

SML produces emergent clarifying-question behavior — models trained only on fully-specified problems learn to handle underspecified tasks by asking for missing information