Can users steer recommendations with natural language at inference?
Can recommendation systems let users specify their preferences in natural language at inference time without retraining? This matters because it would let new users and existing users dynamically adjust what they want to see.
Sequential recommenders predict a user's next interaction from history. Recent work uses LLMs to extract preferences from reviews and feed them as auxiliary supervision during training, but this approach can't be steered at inference: the user's preferences are baked into the model weights, so a new user requires fine-tuning to be served well.
Preference discerning is a different paradigm. Instead of training the model to embody preferences, it conditions the generative recommender on user preferences as text in the model's context window at inference time. An LLM extracts preferences from user reviews and item-specific data, producing a textual description of what the user wants. This text is fed into the sequential recommender as in-context conditioning, alongside the interaction history.
The architectural shift unlocks several capabilities. Users can specify in natural language what they want or want to avoid ("more action, less romantic"). New users without retraining can be served by computing their preferences from minimal data and injecting them into context. The system can be evaluated on preference-following capability, not just next-item prediction — Mender's benchmark covers preference-based recommendation, sentiment following, fine-grained steering, coarse-grained steering, and history consolidation. State-of-the-art sequential recommenders fail several of these axes because they don't have a mechanism to incorporate preferences they didn't train on; Mender succeeds because preferences are a runtime input, not a training target.
The general lesson: making something a context input rather than a parameter target trades efficiency (longer prompts) for flexibility (runtime steering). For tasks where users know better than the training set what they want, the trade is worth it.
Source: Recommenders Personalized
Related concepts in this collection
-
Can language models bridge the gap between critique and preference?
When users express what they dislike rather than what they want, can LLMs reliably transform those critiques into positive preferences that retrieval systems can actually use?
complements: both let users steer recommendations via natural language at inference; preference discerning starts from positive preferences while critiques start from negative ones
-
Can user preferences be learned from just ten questions?
Explores whether adaptive question selection can efficiently infer user-specific reward coefficients without historical data or fine-tuning. This matters for scaling personalization without per-user model updates.
complements: PReF and Mender both achieve inference-time alignment without fine-tuning — PReF via reward factorization, Mender via NL conditioning
-
Can text summaries condition reward models better than embeddings?
Exploring whether learning interpretable text-based summaries of user preferences outperforms embedding vectors for training personalized reward models in language model alignment.
extends: text-based preference conditioning beats embedding conditioning at the reward-model level too — same insight in alignment
-
Can conversational recommenders recover lost preference signals from history?
Conversational recommenders abandoned item and user similarity signals when they shifted to dialogue-focused design. Can integrating historical sessions and look-alike users restore these channels without losing dialogue benefits?
complements: NL preferences from reviews are a fourth preference channel — text-distilled preferences abstract over individual interactions
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
preference discerning conditions sequential recommenders on natural-language preferences in context — letting users steer at inference without fine-tuning