LLM Reasoning and Architecture Reinforcement Learning for LLMs Psychology and Social Cognition

Why does chain-of-thought reasoning fail for personalization?

Standard reasoning traces produce logically sound but personally irrelevant answers. This explores why generic thinking doesn't anchor to user preferences and what might fix it.

Note · 2026-02-23 · sourced from Personalization
What kind of thing is an LLM really? How do people come to trust conversational AI systems? How should researchers navigate LLM reasoning research?

PRIME documents a two-layer failure in applying reasoning to personalization:

Layer 1: Generic CoT fails. Enabling standard chain-of-thought often underperforms the non-thinking baseline for personalization tasks. The uncustomized reasoning trace "merely scratches the surface, seeking broad answers rather than to-the-point, user-specific responses." Generic reasoning explores the problem space without being anchored to the specific user's preferences, values, or communication style — producing reasoning that is logically sound but personally irrelevant.

Layer 2: Fine-tuning destroys thinking capacity. The "fast thinking" training paradigm (direct input→output mapping) turns fine-tuned LLMs into specialist models overfitted to the target space. They lose the generalist capability of generating meaningful intermediate thoughts when prompted. A common error is token repetition — the model has been trained to shortcut directly to outputs and can no longer produce coherent intermediate reasoning. This is not a minor degradation — the model structurally cannot think anymore.

The fix: personalized self-distillation. The model generates its own personalized thinking traces (using its pre-fine-tuning generalist capability), then trains on those traces alongside the standard fine-tuning objective. This produces reasoning that is both user-specific (anchored to the individual's preferences) and deep (maintaining the capacity for intermediate thought). The self-distillation approach leverages the model's own capabilities rather than requiring external reasoning trace data.

This finding extends the reasoning/judgment split documented elsewhere. Since When does explicit reasoning actually help model performance?, personalization is a clear case of "continuous nuanced judgment" — matching preferences, style, and implicit expectations cannot be reduced to logical derivation steps. But PRIME shows the split is not absolute: personalized reasoning can help, provided the reasoning traces themselves are customized to the user.

The connection to Why does asking models to think first hurt performance? is structural: both findings demonstrate that thinking initially hurts but becomes helpful after the thinking process is adapted to the domain. In PRIME's case, self-distillation is the adaptation mechanism; in the TPO case, RL training is. The shared principle: raw thinking capability must be tuned to the domain before it adds value.


Source: Personalization

Related concepts in this collection

Concept map
15 direct connections · 177 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

generic reasoning underperforms non-thinking for personalization tasks — personalized thinking via self-distillation is required because fast-thinking fine-tuning destroys generalist reasoning capability