Psychology and Social Cognition Conversational AI Systems Language Understanding and Pragmatics

Can models learn when NOT to speak in conversations?

Does training AI to explicitly predict silence—through a dedicated silent token—help models understand when intervention adds value versus when they should stay quiet? This matters for building conversational agents that feel naturally helpful rather than intrusive.

Note · 2026-02-22 · sourced from Conversation Topics Dialog
Why do AI agents fail to take initiative? What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

DiscussLLM reframes AI conversational participation as a decision problem with an explicit "do nothing" option. At each turn of a multi-party discussion, the model must decide: remain silent (output the > silent token) or intervene with a helpful contribution. This transforms the passive nature of LLM generation — always producing output when prompted — into active decision-making about participation.

The framework identifies five intervention types:

  1. Factual correction — correcting misinformation in the discussion
  2. Concept definition — clarifying terminology or concepts
  3. (Three additional types from the taxonomy)

Each type corresponds to specific conversational triggers — points where AI intervention adds value. The synthetic data generation pipeline creates 88K multi-turn discussion samples, each containing a natural trigger point. This is notable: the training data explicitly models the absence of intervention (silence at most turns) alongside appropriate intervention moments.

Two architectures are compared:

The evaluation introduces "interruption accuracy" — the percentage of turns where the model correctly remains silent. This metric operationalizes what the proactivity literature has called the civility dimension. Since How can proactive agents avoid feeling intrusive to users?, incorrect interruption is a civility failure.

The decoupled architecture reveals a trade-off between intervention accuracy and computational efficiency. The classifier-generator system is more efficient for deployment (invoking the LLM only when needed) but the end-to-end model better integrates the "when" and "what" decisions.

This work shares the motivation of the Inner Thoughts framework but takes a different approach: where Inner Thoughts models intrinsic motivation to speak, DiscussLLM directly trains the silence/speak decision as a classification task. Both recognize that the fundamental gap is not what to say but when to say it.

The silent token mechanism has a direct inverse application to the Why do multi-agent LLM systems converge without real debate? problem. In multi-agent debate, the failure is premature agreement — agents converging without genuine deliberation. DiscussLLM trains the complementary capability: knowing when silence IS appropriate (not every turn requires intervention) vs. when it is not (genuine disagreement should be voiced). Applied to multi-agent settings, a silent-token-trained agent could distinguish between "I have nothing to add" (legitimate silence) and "I disagree but am accommodating" (premature convergence), addressing the 61% premature consensus rate.


Source: Conversation Topics Dialog

Related concepts in this collection

Concept map
14 direct connections · 108 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

DiscussLLM formalizes when to speak as a learning objective using a silent token to teach models when not to intervene