What makes behavior relevance scoring against candidates more effective than fixed user profiles?

This explores why representing a user as a set of behaviors or personas scored fresh against each candidate item beats collapsing them into one fixed profile vector — and what the corpus says about where rigid profiles actually break.

This explores why representing a user as a set of behaviors or personas scored fresh against each candidate item beats collapsing them into one fixed profile vector. The corpus's clearest answer comes from AMP-CF, which represents each user not as a single latent taste vector but as multiple personas weighted dynamically depending on the candidate being considered Can modeling multiple user personas improve recommendation accuracy?. The key move is that the user representation is recomputed at prediction time against each item, so a candidate cookbook activates the cooking persona and a candidate thriller activates the reading persona — the same person, scored differently per candidate. This candidate-conditional adaptation improves accuracy and, as a free byproduct, explains itself: each recommendation traces back to the specific persona it satisfied, which eliminates the separate diversity-reranking step a fixed profile would need Can attention mechanisms reveal which user taste explains each recommendation?.

The sharpest evidence for why fixed profiles fail isn't about accuracy on average — it's about a specific failure mode. PRIME finds a U-shaped error curve where the *most similar* stored profile produces the worst personalization errors, an uncanny-valley effect: the model confidently applies a nearly-but-not-quite-right preference set, which does more damage than an obvious mismatch Why do similar user profiles produce worse personalization errors?. A fixed profile commits to one such representation and carries that confident wrongness into every candidate. Scoring behaviors against the candidate at hand keeps the system from over-committing — relevance is decided locally, per item, rather than baked in globally.

There's a second, quieter reason hiding in how preferences get stored. PRIME also shows that abstracted, semantic preference summaries beat replaying specific past interactions, and — counterintuitively — that recency-based recall beats similarity-based retrieval Does abstract preference knowledge outperform specific interaction recall?. PLUS pushes the same idea further: a learned *text* summary of a user conditions a reward model more effectively than an embedding vector, because text captures dimensions a frozen vector misses and stays interpretable learned-text-based-user-preference-summaries-condition-reward-models-more-effectiv. The pattern across both is that a static compressed representation throws away exactly the structure relevance-scoring needs.

What the reader might not expect is how cheap the flexible alternative can be. Reward factorization (PReF) shows you don't need to retrain weights to personalize at all — ten adaptive questions are enough to infer a user's personal mix of reward coefficients at inference time Can user preferences be learned from just ten questions?. That reframes the whole comparison: a "fixed profile" isn't even the efficient choice. The efficient choice is a small set of reusable preference dimensions recombined per user and per candidate — the same logic as AMP-CF's personas, just expressed as reward components.

One caution the corpus adds: candidate-conditional scoring is only as good as the signal feeding it. When user history is sparse, even a flexible model has nothing to score, which is why aspect-aware retrieval augmentation is needed to inject richer signal before personalization can work at all Can retrieval enhancement fix explainable recommendations for sparse users?. So the lesson isn't "profiles bad, scoring good" — it's that committing early to a single compressed representation discards the per-item flexibility, the interpretability, and the room to be wrong gracefully that scoring behaviors against candidates preserves.

Sources 7 notes

Can modeling multiple user personas improve recommendation accuracy?

AMP-CF separates user representation into latent personas weighted by attention to the candidate item. This candidate-conditional approach improves accuracy by adapting the user representation at prediction time and produces inherent explanations for why items were recommended.

Can attention mechanisms reveal which user taste explains each recommendation?

AMP-CF represents each user as multiple latent personas weighted dynamically by candidate item. This makes recommendations both diverse and interpretable—each suggestion traces to the specific persona preference it satisfies—without requiring post-hoc reranking.

PRIME shows a U-shaped error curve where most-similar profile replacements cause steepest performance drops. The model confidently applies wrong preferences when profiles are nearly but not truly matched, an uncanny valley effect more harmful than obvious mismatch.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Can user preferences be learned from just ten questions?

PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.

Can retrieval enhancement fix explainable recommendations for sparse users?

ERRA combines model-agnostic review retrieval with personalized aspect selection to address data sparsity that embedded methods cannot solve. Retrieval augmentation provides richer signal when user history is sparse, while aspect personalization ensures explanations match user context rather than generic defaults.

What makes behavior relevance scoring against candidates more effective than fixed user profiles?

Sources 7 notes

Next inquiring lines