How does repeated content shift model outputs across multiple turns?

This explores what happens when the same material keeps reappearing — across the turns of a single conversation, or fed back into the model as input — and how that repetition pushes outputs to drift, narrow, or degrade rather than stay stable.

This reads the question two ways the corpus actually answers: repetition *within* a conversation (the same intent or content restated turn after turn) and repetition *across generations* (content the model produced being fed back in). Both produce shift, but for different reasons.

Within a single conversation, the corpus locates the drift not in some fixed loss of capability but in a widening gap between what the user means and what the model commits to. Why do language models lose performance in longer conversations? shows that models trained by RLHF are rewarded for answering early rather than asking for clarification — so across turns they lock onto a premature reading and keep building on it, and performance recovers when an explicit intent-parsing step is inserted before the model acts. This compounds with a subtler fact: a model doesn't hold a single fixed answer to return to. Do large language models actually commit to a single character? demonstrates that models keep a *superposition* of consistent answers and sample from it at generation time — so each restatement is a fresh draw, not a lookup, and Why does AI output change with every prompt and context? frames this mutability as the defining property of the medium rather than a bug. Repeated content, then, doesn't reliably re-anchor the model; it re-rolls the dice under accumulating context.

What governs how much the output swings? Confidence. Does model confidence predict robustness to prompt changes? found that when a model is highly confident it shrugs off rephrasing, but low confidence makes outputs lurch with every variation — so the same content repeated in slightly different words lands very differently depending on how sure the model already was. And when context conflicts with what the model learned in training, Why do language models ignore information in their context? shows the training priors often win: simply repeating information in the prompt can't override a strong learned association, which is why restating a correction sometimes fails to move the answer at all.

The second sense — content the model generated being fed back as fuel — is where 'shift' becomes irreversible. Does training on AI-generated content permanently degrade model quality? shows that recursive training on synthetic data progressively erases rare events and unusual patterns, with each generation compounding the loss. There's a training-time echo of this in Does RL training collapse format diversity in pretrained models?, where RL amplifies one format from pretraining and suppresses the alternatives within a single epoch — repetition of a reward signal collapsing diversity rather than improving it. The throughline: repeated exposure tends to narrow the distribution the model samples from, whether the loop runs across turns or across training generations.

The more hopeful thread is that this drift is partly trainable away. Can models learn to ignore irrelevant prompt changes? teaches a model to answer identically to clean and 'wrapped' versions of the same prompt by using its own clean responses as targets — deliberately engineering the invariance to repetition that the failure modes above lack. The unexpected payoff for a curious reader: 'staying on track' across turns isn't really about memory at all. Because Do transformer models store knowledge or generate it continuously? argues these models hold knowledge as flowing activation rather than stored records, every turn regenerates the answer from scratch — so consistency has to be actively built, never assumed.

Sources 9 notes

Why do language models lose performance in longer conversations?

LLMs degrade in multi-turn settings because RLHF training rewards premature answers over clarification-seeking, creating pragmatic mismatch with individual user behaviors. A Mediator-Assistant architecture that explicitly parses user intent before execution recovers lost performance without retraining.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Why does AI output change with every prompt and context?

AI outputs exhibit essential mutability—they vary with sampling, prompt wording, and audience interpretation. This is not a defect but a defining feature of tokens as media, making them fundamentally different from fixed commodities and resistant to traditional quality assurance.

Does model confidence predict robustness to prompt changes?

ProSA found that when models are highly confident, they resist prompt rephrasing; low confidence causes major output swings. Larger models, few-shot examples, and objective tasks all correlate with higher confidence and greater robustness.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Does training on AI-generated content permanently degrade model quality?

Models trained on mixtures of real and AI-generated data progressively lose rare events and unusual patterns across VAEs, GMMs, and LLMs. Each generation compounds the loss, making genuine human data increasingly valuable.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Can models learn to ignore irrelevant prompt changes?

Two methods—BCT (output-level) and ACT (activation-level)—train models to respond identically to clean and wrapped prompts by using the model's own clean responses as targets, eliminating specification and capability staleness inherent in standard SFT.

Do transformer models store knowledge or generate it continuously?

Transformers organize knowledge as flowing activations rather than retrievable archives, mirroring oral cultures where knowledge exists only in performance. This explains why model knowledge is contextual, difficult to edit, and inseparable from generation.

How does repeated content shift model outputs across multiple turns?

Sources 9 notes

Next inquiring lines