What preference data do different personalized alignment methods actually need?

This explores how the *kind* of preference data — its source, its granularity, its abstraction level — changes depending on which personalized alignment method you're using, and where the corpus suggests the data we collect is the wrong data entirely.

This reads the question as: not 'how much' preference data, but 'what shape' — because the corpus keeps showing that different methods are hungry for fundamentally different signals, and that mismatches between method and data are where personalization quietly breaks. The most striking thread is that more raw data rarely helps. Can careful curation replace massive alignment datasets? shows 1,000 well-chosen examples beat orders of magnitude more, because post-training activates capabilities the model already has rather than teaching new ones. So the real question is which 1,000 signals.

On *source*, the corpus is counterintuitive. Do user outputs outperform inputs for LLM personalization? finds that profiles built from what a user *produces* match or beat full profiles, while profiles built from their *queries* actually degrade performance — personalization runs on style and preference, not on the semantic content of what someone asks. And Does abstract preference knowledge outperform specific interaction recall? (PRIME) pushes further: abstracted preference *summaries* consistently beat retrieving specific past interactions. Methods that lean on episodic recall are feeding on a weaker signal than methods that distill a compact preference model. Together these say: the useful data is digested, not raw.

On *granularity*, Does segment-level optimization work better for multi-turn dialogue alignment? (SDPO) shows that for multi-turn dialogue, turn-level preference pairs are too fine (you optimize noise) and session-level too coarse (irrelevant turns contaminate the signal) — the right unit is the segment around the turn that actually went wrong. The granularity of your preference labels has to match the granularity of the behavior you're trying to fix.

Then there's a quieter, sharper warning: the preference data we collect may not measure preference at all. Do all annotation responses measure the same underlying thing? decomposes annotations into genuine preferences, non-attitudes, and constructed-on-the-spot preferences — and treating them uniformly contaminates reward models. Can language models bridge the gap between critique and preference? offers a partial fix from the other direction, turning vague negative feedback ('doesn't look right for a date') into usable positive preferences. So some methods don't just need preference data — they need to *clean* or *transform* it first.

The deepest cut is whether preference is the right target. Can user preference guide AI writing tool alignment? finds writers prefer AI rewrites 63% of the time yet object to the persona distortions baked into those same rewrites — polish and distortion are entangled, so optimizing the preference signal optimizes the harm with it. Should AI alignment target preferences or social role norms? generalizes this: preferences don't capture thick moral values, and aggregating them produces systematic misalignment, arguing alignment should target social-role norms instead. So the honest answer to 'what data do these methods need' includes a method that concludes the data you'd naturally collect — stated preferences — is the wrong foundation entirely.

Sources 8 notes

Can careful curation replace massive alignment datasets?

LIMA demonstrates that 1000 carefully curated examples fine-tuned on a strong pretrained model achieve competitive alignment performance with models trained on orders of magnitude more data, showing that post-training activates existing capabilities rather than building new ones.

Do user outputs outperform inputs for LLM personalization?

Research shows that user profiles built from outputs alone match or exceed performance of complete profiles across multiple tasks, while input-only profiles degrade performance. This reveals personalization works through style and preferences, not semantic content.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Does segment-level optimization work better for multi-turn dialogue alignment?

SDPO identifies erroneous turns and optimizes surrounding segments, achieving simultaneous improvements in goal completion and relationship quality. Turn-level DPO is too granular; session-level introduces noise from irrelevant turns.

Do all annotation responses measure the same underlying thing?

Behavioral science reveals that annotations contain genuine preferences, non-attitudes, and constructed preferences—distinguishable by consistency across measurement conditions. Treating them uniformly contaminates reward model training and downstream alignment.

Can language models bridge the gap between critique and preference?

Few-shot LLM prompting can convert natural negative feedback like "doesn't look good for a date" into positive preferences like "prefer more romantic," enabling retrieval systems to find better-matching recommendations without fine-tuning.

Can user preference guide AI writing tool alignment?

Writers prefer AI rewrites 63% of the time but object to systematic persona distortions those same rewrites introduce. Mitigation studies show polish and distortion are entangled at the model level—preference optimization produces both simultaneously.

Should AI alignment target preferences or social role norms?

Preferentialist alignment approaches fail because preferences don't capture thick moral values, uniform aggregation produces epistemic injustice, and preference optimization creates systematic misalignment with social roles. Contractualist alignment negotiated by stakeholders and bounded by supra-national, organizational, and individual levels works better.

What preference data do different personalized alignment methods actually need?

Sources 8 notes

Next inquiring lines