INQUIRING LINE

Why do untrained summarizers focus on topics rather than preference dimensions?

This explores why summarizers, left to their pretrained defaults, capture what a text is *about* (topics) instead of what a *person wants* from it (preference dimensions) — and what closes that gap.


This explores why summarizers, left to their pretrained defaults, latch onto topical salience rather than the dimensions that encode a user's preferences — and the corpus is unusually consistent about the cause: it isn't model capacity, it's the absence of a training signal pointed at the right target. A zero-shot summarizer is optimized to produce fluent, representative prose, so it surfaces the most statistically prominent thing in a document — its subject matter. Preference dimensions ("prefers romantic," "cares about durability over price") are not the most prominent features; they only become salient once a downstream objective tells the model they matter. Can text summaries beat embeddings for personalized reward models? makes this concrete: PLUS trains the summarizer and the reward model *jointly*, and the learned summaries capture exactly the dimensions that zero-shot summaries miss. Topic-focus is the default; preference-focus has to be taught.

The reason topics win by default shows up from a different angle in Do LLMs compress concepts more aggressively than humans do?: LLMs compress toward broad category structure and discard the fine-grained, situation-specific distinctions humans preserve. Preference dimensions live precisely in that discarded layer — they are the contextual nuance that makes one romantic dinner spot different from another. An untrained summarizer maximizing compression efficiency will flatten those distinctions into a topic label, because the topic is the cheapest faithful description. Without a distortion penalty that says "these subtle differences are the point," the model has no reason to keep them.

What flips the behavior is aligning the summarizer to the actual downstream use rather than to generic fidelity. Can reinforcement learning align summarization with ranking goals? is the clearest demonstration: when ReLSum is rewarded by downstream ranking relevance instead of prose quality, it stops writing fluent paragraphs and starts producing dense, attribute-focused summaries — exactly the preference-dimension structure that improves recall and engagement. The summarizer focuses on topics until you change *what it's scored on*; then it focuses on the attributes the score rewards. The same lesson generalizes in Why do language models engage with conversational distractors?, where a tiny, targeted training set teaches models a behavior (resisting distraction) that pretraining never instilled — the gap is missing signal, not missing ability.

Two adjacent findings deepen the picture. Do all annotation responses measure the same underlying thing? shows that preferences are *hard to even define* in the training data: annotation responses mix genuine preferences, non-attitudes, and constructed preferences. If the supervision can't cleanly isolate preference signal, a summarizer has no reliable target to focus on and falls back to the unambiguous thing — topic. And Can language models bridge the gap between critique and preference? shows the dimension a summarizer should be capturing often has to be *constructed*, not read off the surface: "doesn't look good for a date" only becomes the usable preference "prefer more romantic" after an explicit transformation step. Topics sit on the surface; preference dimensions have to be inferred, transformed, and rewarded into existence.

The thing worth taking away: topic-focus isn't a bug in the summarizer, it's the honest output of an objective that was never told preferences exist. Every fix in the corpus is the same move under different names — give the summarizer a downstream signal (ranking score, joint reward model, targeted instruction tuning) that makes preference dimensions the thing it's graded on, and the topical default dissolves.


Sources 6 notes

Can text summaries beat embeddings for personalized reward models?

PLUS trains summarizers and reward models jointly, learning that text-based preference summaries capture dimensions zero-shot summaries miss. These summaries transfer to GPT-4 for zero-shot personalization and remain interpretable to users.

Do LLMs compress concepts more aggressively than humans do?

Using Rate-Distortion Theory on cognitive datasets, LLMs capture broad category structure but lose fine-grained distinctions humans preserve. LLMs maximize compression efficiency; humans trade compression for contextual meaning that enables situated action.

Can reinforcement learning align summarization with ranking goals?

ReLSum trains summarizers using downstream relevance scores as RL rewards, producing dense, attribute-focused summaries instead of fluent prose. This alignment to the actual ranking metric improves recall, NDCG, and user engagement in production e-commerce search.

Why do language models engage with conversational distractors?

Fine-tuning on just 1,080 synthetic dialogues with distractor turns significantly improves topic resilience, revealing that the gap is not model capacity but absent training signal. Models learn to follow what-to-do instructions but not what-to-ignore instructions.

Do all annotation responses measure the same underlying thing?

Behavioral science reveals that annotations contain genuine preferences, non-attitudes, and constructed preferences—distinguishable by consistency across measurement conditions. Treating them uniformly contaminates reward model training and downstream alignment.

Can language models bridge the gap between critique and preference?

Few-shot LLM prompting can convert natural negative feedback like "doesn't look good for a date" into positive preferences like "prefer more romantic," enabling retrieval systems to find better-matching recommendations without fine-tuning.

Next inquiring lines