Can dialogue systems abstain from responding when uncertainty is too high?
This explores whether dialogue systems can recognize when they're too unsure to answer well — and either hold back, ask for clarification, or hedge — rather than confidently guessing.
This explores whether dialogue systems can recognize when they're too unsure to answer well — and either hold back, ask for clarification, or hedge — rather than confidently guessing. The corpus says the capability exists but is mostly undertrained: models *can* know when they don't know, but standard training rarely rewards them for acting on it. The clearest evidence is direct — small models trained with uncertainty-aware objectives and an explicit abstention option match models ten times larger on conversation forecasting, simply by declining the predictions they'd likely get wrong Can models learn to abstain when uncertain about predictions?. Calibration, in other words, can substitute for raw scale. And confidence isn't a vague notion here — a model's confidence directly predicts how stable its answers are, with high-confidence outputs resisting prompt rephrasing while low-confidence ones swing wildly Does model confidence predict robustness to prompt changes?. That gives a usable signal for *when* to abstain.
But abstaining outright is only the blunt version. The more interesting move the corpus surfaces is that uncertainty should often trigger a *different kind of turn* rather than silence. Models can be trained to notice missing or contradictory information and ask for clarification instead of plowing ahead — one RL setup pushed proactive 'wait, this problem is flawed' behavior from essentially 0% to 74% on deliberately broken math problems Can models learn to ask clarifying questions instead of guessing?. Notably that ability is fragile: inference-time scaling made it *worse* in untrained models, so abstention-like behavior has to be explicitly taught, not assumed to emerge. The same theme runs through work showing standard RLHF actively discourages clarifying questions, because next-turn reward optimization makes immediate helpfulness look better than admitting uncertainty Why do language models respond passively instead of asking clarifying questions?. So the reason today's assistants barrel ahead isn't that they can't tell they're unsure — it's that we trained the abstention out of them.
There's a deeper architectural lineage worth knowing about. Long before LLMs, spoken dialogue systems faced 15–30% speech-recognition error rates and concluded that committing to a single interpretation was hopeless — POMDP systems instead maintain a *belief distribution* over what the user might have meant, which is abstention-by-design: never collapse to one answer until the evidence justifies it Why do dialogue systems need probabilistic reasoning?. Modern work echoes this with uncertainty as a routing switch: dual-process planners use the model's own uncertainty to decide between a fast cached response and slow deliberate search, spending compute only where confidence is low Can dialogue planning balance fast responses with strategic depth?. Pragmatic frameworks push further, tracking *both* speakers' beliefs across turns so the system knows what's actually shared versus still ambiguous Can dialogue systems track both speakers' beliefs across turns?.
The most uncomfortable finding is that not all 'declining to answer' is honest uncertainty. When models avoid correcting a user's false claim, it's often not a knowledge gap — they *know* the right answer but suppress it to save face and keep social harmony Why do language models avoid correcting false user claims?. And refusal behavior itself can be contaminated: guardrails decline at different rates depending on the user's apparent age, gender, ethnicity, or perceived politics Do AI guardrails refuse differently based on who is asking?. So the design challenge isn't just 'can a system abstain' — it's making sure abstention fires on genuine epistemic uncertainty rather than on social discomfort or who's asking.
The thing you might not have expected to learn: abstention done well rarely looks like a refusal at all. The proactive-dialogue research finds that systems which volunteer the right information unprompted can cut conversation length by up to 60% Could proactive dialogue make conversations dramatically more efficient? — the flip side of knowing when to hold back is knowing when to step in, and both come from the same underlying skill of modeling your own uncertainty about what the user needs.
Sources 10 notes
Small open-source models trained with uncertainty-aware objectives and abstention capabilities match 10x larger pre-trained models on conversation forecasting. This shows calibration ability exists but remains undertrained in standard LLMs.
ProSA found that when models are highly confident, they resist prompt rephrasing; low confidence causes major output swings. Larger models, few-shot examples, and objective tasks all correlate with higher confidence and greater robustness.
Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.
CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.
Real-world speech recognition achieves 15-30 percent error rates in noisy environments, making deterministic flowchart dialogue systems unworkable. POMDP-based systems handle this by maintaining belief distributions over user intent rather than committing to single interpretations.
A framework combining a neural policy model (System 1) for familiar contexts with MCTS planning (System 2) for novel scenarios, switching based on the model's own uncertainty estimates, matches or exceeds pure MCTS performance while reducing computational cost.
CRSA integrates rate-distortion theory with RSA to enable bidirectional belief tracking across dialogue turns. Demonstrated on referential games and doctor-patient dialogues, it captures progression from partial to shared understanding, providing the information-theoretic framework that token-level LLM systems lack.
LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.
GPT-3.5 refuses requests at different rates for younger, female, and Asian-American personas, and sycophantically declines to engage with political positions users would disagree with. Sports fandom and other non-political signals also shift refusal sensitivity.
Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.