Conversational AI Systems Language Understanding and Pragmatics Psychology and Social Cognition

Can models learn to abstain when uncertain about predictions?

Explores whether language models can be trained to recognize when they lack sufficient information to forecast conversation outcomes, rather than forcing uncertain predictions into confident-sounding responses.

Note · 2026-02-22 · sourced from Conversation Architecture Structure
Why do AI conversations reliably break down after multiple turns? How should researchers navigate LLM reasoning research?

Generating a single plausible next-utterance is not the same as modeling the uncertainty about ALL possible next-utterances in a calibrated way. In negotiations, "Sounds good!" and "No thanks" may be equally fluent/topical/informative responses, but one may be more likely given the goals, beliefs, and emotions of the interlocutors.

FortUne Dial formalizes this as conversation uncertainty modeling, shifting evaluation from pure accuracy to uncertainty-aware metrics that enable abstention on individual instances. When the model estimates high uncertainty about an outcome, it should say "I don't know" rather than forcing a prediction.

Two representations of uncertainty:

Two fine-tuning strategies improve calibration:

The practical result: smaller open-source models, once calibrated, can compete with pre-trained models 10x their size on uncertainty-aware forecasting. This suggests that calibration ability is undertrained in standard LLMs — the capability exists but the training signal is absent.

Applications include: studying effects of strategy and social structure in negotiations, intervening to improve human and machine conversations, and assessing trust/heterogeneity in data sources via entropy metrics.

Real-world deployment evidence from CRAFT: When the CRAFT conversational forecasting model was deployed as a prototype moderation tool for Wikipedia editors, moderator feedback revealed critical design dimensions. Score change (trajectory) was more actionable than absolute score — moderators preferred seeing whether a conversation was trending toward derailment rather than a static risk number. Crucially, moderator confidence in predicting derailment varied dramatically: four of nine participants believed they could forecast in any Wikipedia context, four others only in very specific contexts with low confidence, and one only for personally-known participants on familiar topics. This variance means forecasting tools must accommodate heterogeneous human expertise rather than assuming uniform detection ability. A further missing dimension: conversation age. Moderators reported that inactive conversations (>2-3 days since last comment) are unlikely to revive, much less turn uncivil — but the prototype did not surface this temporal signal. The scale problem is stark: even topic-engaged moderators cannot proactively monitor all at-risk conversations, forcing them to rely on random discovery strategies.

Since Does reasoning fine-tuning make models worse at declining to answer?, calibrated uncertainty and appropriate abstention are capabilities that current training actively degrades. Since Does training objective determine which direction models fail at abstention?, the direction of calibration failure depends on the training regime — a forecasting system built on reasoning-trained models would over-predict, while one built on safety-trained models would refuse to predict. Conversation forecasting requires the opposite of both failure modes: models that know what they don't know about where a conversation is heading.

Additional empirical domain — Instagram hostility forecasting: A separate forecasting study on Instagram demonstrates that hostile comments can be predicted from early conversational signals: AUC 0.82 for predicting hostility presence 10+ hours in the future, and AUC 0.91 for predicting whether a post will receive more than 10 hostile comments vs. only one. Predictive features include the post author's history of receiving hostile comments, user-directed profanity, number of distinct participants, and hostility trends in the conversation so far. This complements the CRAFT deployment evidence above — different platform, similar principle: early conversational dynamics carry forecastable signal about future trajectory.


Source: Conversation Architecture Structure

Related concepts in this collection

Concept map
19 direct connections · 166 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

conversation forecasting under uncertainty requires calibrated probability estimates — calibrated models should abstain on uncertain predictions rather than forcing outputs