Can reward factorization actually scale personalization to large user bases?

This explores reward factorization — representing each user's preferences as a weighted combination of shared 'base' reward functions rather than training a separate reward model per person — and asks whether that math trick is what finally makes personalization affordable across millions of users.

This explores reward factorization: the idea that instead of giving every user their own trained reward model, you learn a small set of shared base reward functions once, then describe each individual as a lightweight set of coefficients over those bases. The appeal for scale is concrete. Can user preferences be learned from just ten questions? shows the PReF approach inferring a personalized reward from roughly ten adaptive questions — and crucially, it personalizes at inference time without touching model weights. That's the whole scaling argument in one move: if a new user is just a coefficient vector, onboarding millions of people costs ten questions each rather than a training run each.

But 'can it scale' has two meanings, and the corpus pulls them apart. Cheaply, yes — the cost answer is encouraging. Safely is a different question. Does personalizing reward models amplify user echo chambers? warns that the very thing that makes per-user reward models powerful — dropping the averaging effect of an aggregate model — is also what lets them learn flattery and harden each person into their own echo chamber. The note explicitly says this 'mirrors recommender-system failures,' which is the tell: factorization scales the personalization *and* the pathology together. So the honest answer is that reward factorization scales the mechanism, but scale without safeguards scales the harm too.

There's also a quieter question hiding underneath: what should the user representation actually be? Linear coefficients are clean, but other notes argue a single vector is too thin a description of a person. Can attention mechanisms reveal which user taste explains each recommendation? and Can modeling multiple user personas improve recommendation accuracy? both find that modeling a user as several attention-weighted personas — not one monolithic taste — improves accuracy and, as a bonus, explains *which* facet of you a given output satisfies. Reward factorization's bases are a structural cousin of these personas: shared building blocks, per-user mixing weights. The convergence is suggestive — different subfields keep rediscovering that 'shared components + personal weights' is the unit that scales.

The representation debate cuts the other way too. Can text summaries beat embeddings for personalized reward models? finds that conditioning a reward model on a learned *text* summary of a user beats conditioning on an embedding vector — and the summary stays human-readable and even transfers zero-shot to a different model. Does abstract preference knowledge outperform specific interaction recall? reinforces this: abstract preference knowledge consistently outperforms replaying specific past interactions. So if the goal is scale, factored numeric coefficients aren't the only contender — a compact natural-language preference summary may carry more of the relevant signal per byte, and it's auditable in a way a coefficient vector never is.

The thing worth walking away knowing: reward factorization isn't really competing on accuracy, it's competing on *amortization* — pay the heavy cost once for the shared bases, then make each user nearly free. That's a genuinely different scaling curve from per-user fine-tuning. But the corpus says the open problems are no longer computational. They're whether thin coefficient vectors capture enough of a person (the persona work says maybe not), whether text beats math as the carrier (the summary work says maybe so), and whether scaling personalization at all is desirable once you see what it does to sycophancy and polarization at population scale.

Sources 6 notes

Can user preferences be learned from just ten questions?

PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.

Does personalizing reward models amplify user echo chambers?

Specializing reward models per user removes the averaging effect of aggregate models, allowing systems to learn sycophancy and reinforce polarization at scale, mirroring recommender-system failures.

Can attention mechanisms reveal which user taste explains each recommendation?

AMP-CF represents each user as multiple latent personas weighted dynamically by candidate item. This makes recommendations both diverse and interpretable—each suggestion traces to the specific persona preference it satisfies—without requiring post-hoc reranking.

Can modeling multiple user personas improve recommendation accuracy?

AMP-CF separates user representation into latent personas weighted by attention to the candidate item. This candidate-conditional approach improves accuracy by adapting the user representation at prediction time and produces inherent explanations for why items were recommended.

Can text summaries beat embeddings for personalized reward models?

PLUS trains summarizers and reward models jointly, learning that text-based preference summaries capture dimensions zero-shot summaries miss. These summaries transfer to GPT-4 for zero-shot personalization and remain interpretable to users.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Can reward factorization actually scale personalization to large user bases?

Sources 6 notes

Next inquiring lines