Recommender Systems

Can aggregate reward models satisfy genuinely disagreeing users?

When users have conflicting preferences, do aggregate reward models face an impossible choice between satisfying majorities or sampling proportionally? What does this reveal about RLHF deployment?

Note · 2026-05-18 · sourced from Recommenders Personalized

A clean argument for why aggregate reward models cannot serve disagreement-heavy tasks. Consider a subjective question where 51% of the target audience prefer answer A and 49% prefer answer B. With a single reward model trained on aggregated preferences, the deployment has exactly two options. Pick A as the preferred answer: 49% of users are unhappy 100% of the time. Sample A and B proportionally to their preference rates: 100% of users are unhappy approximately half the time. Both options are unsatisfactory.

The structural problem is that aggregate reward models compress preference distributions into single scalars (or single rankings) that cannot represent disagreement. They reward what the majority prefers and incidentally suppress what the minority prefers. For tasks with high consensus this is fine — the majority preference is everyone's preference. For tasks with genuine disagreement — subjective evaluations, value-laden topics, creative judgment, cultural-context-dependent choices — aggregate models systematically exclude the minority view.

This is not a quality problem with current reward models. It is a representational problem with the aggregation step itself. Even a perfect aggregate reward model would face this dilemma. The fix has to operate at a different level: reward models that can be specialized to individual users (or to user groups whose preferences cluster) rather than averaged across the population.

The implication extends beyond personalization. Whenever a system is deployed against a heterogeneous user base with genuinely divergent preferences, the standard "train one model to satisfy everyone" architecture is incompatible with satisfying anyone fully. The right architecture either splits per-user (personalization) or splits per-cluster (group-level adaptation). Aggregate reward modeling becomes appropriate only when the underlying preferences are actually unimodal — and that is a stronger assumption than RLHF deployments typically test.

Related concepts in this collection

Concept map
12 direct connections · 86 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

aggregate reward models systematically exclude minority preferences — the dilemma of preferred answer or proportional sampling is a structural failure of one-size-fits-all RLHF