Conversational AI Systems

Why do AI assistants get worse at longer conversations?

Explores why LLM performance drops 25 points when instructions span multiple turns instead of one message, and whether models can recover from early wrong assumptions.

Note · 2026-02-22 · sourced from Conversation Topics Dialog
Why do AI conversations reliably break down after multiple turns? What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

Post angle for Medium/LinkedIn

Your AI assistant is getting dumber the longer you talk to it — and it's because we trained it to be too helpful.

That's the counterintuitive finding from two converging research papers. When LLMs receive fully-specified instructions in a single message, they perform at ~90% accuracy. But spread those same instructions across a natural conversation — revealing details gradually, the way humans actually communicate — and performance drops to ~65%. A 25-point gap. And it appears even in two-turn conversations.

What goes wrong:

LLMs make premature assumptions when information is incomplete, propose solutions too early, and then lock in to those initial guesses. When the user provides more details that contradict the early assumptions, the models can't course-correct — they get lost and don't recover.

Why it happens:

This isn't a model limitation. The Intent Mismatch paper argues it's a rational strategy induced by RLHF training. Models are trained to be helpful. Under uncertainty, being helpful means guessing rather than asking. The training literally rewards premature commitment.

The real bottleneck is pragmatic mismatch: users exhibit individual variation in how they express intent. The same fragmentary utterance might be a confirmation, a correction, or a refinement — but models aligned to the "average" user default to interpreting it as confirmation of their own assumptions.

What fixes it:

The deeper point:

We built AI that's spectacular at answering questions and terrible at having conversations. The multi-turn case is the real-world case — and the training signals that made models impressive in benchmarks are the same signals that make them fragile in dialogue.


Source: Conversation Topics Dialog, Conversation Architecture Structure

Key sources:

Original note title

the wrong turn problem — why AI conversations go off the rails and cant recover