Why do bag-of-mentions models discard conversation order in the first place?

This explores why conversational recommenders defaulted to treating dialogue as an unordered bag of mentioned items — and what that simplification was actually buying them.

This explores why the 'bag-of-mentions' approach — treating a conversation as an unordered set of items and entities someone named, ignoring the sequence they came up in — became the default in the first place. The short version: it was the path of least resistance. Early conversational recommender systems (CRS) were built on entity-linking and knowledge-graph pipelines that asked a simpler question — *which* things did the user mention? — not *in what order*. Once you've extracted the set of mentioned entities, you can match them against a catalog without modeling any dependencies between them. Order is expensive to model and a set is cheap, so the field discarded order because the dominant architectures had no natural slot for it, not because anyone proved it was noise. Does conversation order matter for recommending items in dialogue? is the corpus's direct rebuttal: when you model mentions in the order they appear with a transformer, you recover prequel/sequel dependencies between them and improve recommendation accuracy — which means the order was carrying signal the bag was throwing away.

What's striking is that this isn't a quirk of old CRS pipelines — the order-blindness shows up even in modern LLMs that have no architectural excuse for it. Why do language models ignore temporal order in ranking? finds that LLMs *can* read preferences out of an interaction history but disregard temporal order by default, until a recency-focused prompt explicitly wakes up their latent sensitivity to it. So 'bag-of-mentions' is less a single broken model and more a recurring default: given a list of things a user touched, systems gravitate toward treating it as a flat set unless something forces them to honor sequence. The order isn't unrecoverable — it's just not activated.

The deeper reason order gets dropped connects to what these systems were rewarded to do. Can conversational recommenders recover lost preference signals from history? points out that most CRS only mine the current dialogue session for preferences, discarding entire channels (item-level and user-level collaborative signals) that traditional recommenders rely on. A system that's already ignoring whole sources of preference structure is unlikely to fuss over the finer-grained structure of *ordering within* a session. Discarding order is one instance of a broader habit: compress the conversation down to whatever minimal representation the recommender's matching step can consume.

And there's a cost to that compression that the corpus maps from a different angle. Does including all conversation history actually help retrieval? shows that not all turns are equal — topic switches inject irrelevant context, and selecting the *right* turns beats dumping everything in. That cuts both ways for bag-of-mentions: a flat set can't tell an early, since-abandoned preference from the user's current intent, because it has erased the timeline that would let it down-weight stale mentions. Why do language models fail in gradually revealed conversations? sharpens this further — when systems collapse a gradually-revealed conversation into a premature, structureless guess, they lock in early and can't recover. Order isn't just trivia about sequence; it's the scaffolding that tells you which mentions are still live.

So the honest answer to 'why discard it in the first place' is: because the modeling tools made sets cheap and sequences expensive, because the training signal never demanded order, and because nobody had shown the order was load-bearing until sequential models recovered measurable accuracy from it. The interesting twist the corpus leaves you with is that the order was never truly gone — in LLMs it's latent and promptable, and in CRS it's recoverable with a transformer over mentions. Bag-of-mentions didn't destroy the information so much as decline to look at it.

Sources 5 notes

Does conversation order matter for recommending items in dialogue?

TSCR models items and entities in the order they appear in CRS dialogue, using transformers to learn dependencies between sequential mentions. This recovers information that bag-of-mentions approaches discard, improving recommendation accuracy on standard benchmarks.

Why do language models ignore temporal order in ranking?

LLMs can extract preferences from interaction histories but disregard temporal order by default. Recency-focused prompts and in-context examples activate latent order-sensitivity, improving ranking without retraining.

Can conversational recommenders recover lost preference signals from history?

Current CRS systems only use the active dialogue session to infer preferences, losing item-CF and user-CF signals proven valuable in traditional recommenders. Integrating current session, historical dialogues, and look-alike users—conditioned on current intent—recovers essential user representation structure.

Does including all conversation history actually help retrieval?

Research shows that automatically selecting relevant previous turns improves retrieval effectiveness more than including all context. Topic switches inject irrelevant information; joint optimization of selection and retrieval beats both full-context baselines and human annotation.

Why do language models fail in gradually revealed conversations?

Across 200,000+ conversations, all major LLMs show 39% average performance drop in multi-turn settings due to locking into incorrect early guesses. Agent mitigations recover only 15-20% of this loss.

Why do bag-of-mentions models discard conversation order in the first place?

Sources 5 notes

Next inquiring lines