Should recommenders discard old user data uniformly or selectively retain historical signals?

This explores whether a recommender should age out user history wholesale (treat old behavior as stale) or keep some signals while dropping others — and what the corpus says about telling durable preference from transient noise.

This question is really about whether 'old' and 'irrelevant' are the same thing. The corpus answers with a clear no: the consistent recommendation is selective retention — preserve durable long-term signal while discounting only the transient, noisy parts. The sharpest statement of this comes from work arguing that recommenders need per-user concept drift modeling rather than population-level drift detection Why do global concept drift methods fail for recommender systems?. Because each person's tastes shift on their own timescale for their own reasons, a global 'flush everything older than X' rule throws away signal for the stable user and keeps stale signal for the volatile one. The fix is to model staleness per user, not per clock.

But before you can selectively retain, you have to decide what 'old data' even means — and several notes complicate the idea that older equals worse. Time-of-period modeling shows that what looks like aged data is often a recurring cycle: a hypernetwork conditioned on time-of-period recovers weekly and daily rhythms that simple change-point detection treats as noise Why do recommendation systems miss recurring user preference patterns?. Under that lens, last Tuesday's behavior isn't expired — it's the best predictor of next Tuesday. Meanwhile, the case for discarding *something* is real: the same user gives the same item wildly different ratings across sessions due to mood, anchoring, and rater idiosyncrasy Why do the same users rate items differently each time?. So the discard target isn't old data — it's the noise riding on top of any data, old or new.

The most interesting move in the corpus is to sidestep the retain-vs-discard framing by changing *what* you store. Instead of keeping raw past interactions (episodic memory) and deciding which to forget, you can abstract them into preference summaries (semantic memory) that compress many interactions into stable knowledge — and this consistently outperforms retrieving specific past events Does abstract preference knowledge outperform specific interaction recall?. Notably, that same work finds recency-based recall beats similarity-based recall, which is a quiet endorsement of *weighting* old data down rather than deleting it. The architectural version of the same idea isolates parameters per time-task so older patterns are preserved exactly while new parameters capture emerging tastes, giving you an explicit stability-vs-plasticity dial instead of a blunt forget threshold Can model isolation solve streaming recommendation better than replay?.

There's also a structural reason uniform discarding backfires: history is where the collaborative-filtering signal lives. Conversational recommenders that rely only on the current session lose the item-similarity and look-alike-user channels that traditional systems prove valuable — you need current intent, historical dialogues, and similar users together, with history conditioned on the present rather than dumped Can conversational recommenders recover lost preference signals from history?. And a purely mechanical caution: fixed-size systems that let old IDs decay while new ones stream in see hash collisions pile up exactly on the highest-frequency users and items — so naive aging of the ID space degrades quality precisely where you can least afford it Why do hash collisions hurt recommendation models so much?.

The synthesis a curious reader might not expect: nobody in this corpus recommends uniform discarding, but the reason isn't sentimentality about old data — it's that 'old' is the wrong axis. The useful axis is durable-vs-transient (keep the durable, discount the transient), recurring-vs-one-off (cycles aren't stale), and raw-vs-abstracted (compress history into preference knowledge instead of hoarding or deleting events). Retention isn't a deletion policy; it's a weighting and representation problem.

Sources 7 notes

Why do global concept drift methods fail for recommender systems?

User preferences shift on individual timescales for individual reasons, making population-level drift detection ineffective. Per-user temporal modeling that preserves long-term signals while discounting transient noise is required.

Why do recommendation systems miss recurring user preference patterns?

HyperBandit conditions a hypernetwork on time-of-period to generate user preference parameters, capturing weekly and daily cycles that change-point detection misses. This treats time itself as a context dimension, so matching time periods retrieve matching preference functions rather than treating each period as novel evidence.

Why do the same users rate items differently each time?

Amatriain et al. found that the same user gives substantially different ratings to the same item across sessions, shifting by multiple stars. This noise stems from temporal inconsistency, rater-specific biases, and anchoring effects—making ratings reflect both preference and rating-behavior rather than stable preference alone.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Can model isolation solve streaming recommendation better than replay?

DEGC uses per-task parameter isolation to handle streaming recommendation, providing explicit stability-plasticity trade-offs that experience replay and knowledge distillation methods cannot match. This approach preserves older patterns exactly while allowing new parameters to capture emerging preferences.

Can conversational recommenders recover lost preference signals from history?

Current CRS systems only use the active dialogue session to infer preferences, losing item-CF and user-CF signals proven valuable in traditional recommenders. Integrating current session, historical dialogues, and look-alike users—conditioned on current intent—recovers essential user representation structure.

Why do hash collisions hurt recommendation models so much?

Monolith's empirical work shows that real recommendation systems have power-law distributed frequencies, causing collisions to accumulate precisely on the entities models need most accurate. Fixed-size hashed tables worsen this over time as new IDs arrive.

Should recommenders discard old user data uniformly or selectively retain historical signals?

Sources 7 notes

Next inquiring lines