Can personalized questions improve conversation quality in open-domain chat?
This explores whether having an AI ask personalized questions — not just answer them — makes open-domain conversations better, and the corpus reframes 'quality' as a tension between gathering information, building rapport, and not annoying the user.
This explores whether personalized questions improve open-domain chat — and the collection's most useful move is to split that into two questions: can questions personalize a conversation, and does asking them actually make the conversation *better*? On the first, the corpus is encouraging. Curiosity reward shows a model can personalize in real time simply by being rewarded for reducing its uncertainty about who it's talking to — no pre-built user profile required, the questions themselves do the work Can conversations themselves personalize without user profiles?. A complementary line shows you don't need many questions at all: roughly ten well-chosen, maximally-informative questions can pin down a user's personalized reward coefficients Can user preferences be learned from just ten questions?. So 'ask to personalize' is not just plausible — it's efficient.
But the quality of the *questions* turns out to be the whole ballgame. The ALFA work argues that 'ask good questions' is too vague to train on, and decomposes question quality into concrete attributes — clarity, relevance, specificity — and trains on attribute-specific preference pairs, which beats optimizing for a single quality score Can models learn to ask genuinely useful clarifying questions?. In other words, a personalized question only helps if it's also a *good* question; personalization without that decomposition just generates more text.
Here's the part a reader might not expect: the dominant way we train chat models actively suppresses question-asking. RLHF rewards confident single-turn answers, so models learn to skip clarifying questions and understanding-checks — grounding acts drop to roughly a fifth of human levels, an 'alignment tax' where the bot looks helpful but quietly loses the thread across turns Does preference optimization harm conversational understanding?. A parallel note frames this as something deeper than a training artifact: smooth conversation runs on implicit social maintenance — repair, topic hand-off — that models never learn because their training rewards predicting information, not doing relational work Why don't language models develop conversation maintenance skills?. So personalized questions aren't just a feature to add; they're a corrective to a built-in deficit.
Then the corpus complicates 'quality' itself. Proactive dialogue — volunteering relevant information instead of asking — can cut conversation turns by up to 60%, which is one strong definition of a better conversation Could proactive dialogue make conversations dramatically more efficient?. That sits in real tension with question-asking, which *adds* turns. And longitudinally, personalization is double-edged: it builds trust and anthropomorphism over repeated sessions, but it simultaneously raises privacy concerns and user expectations, so each personalized question quietly raises the bar for the next answer Does chatbot personalization build trust or expose privacy risks?. Personalized questions can also be a memory problem rather than a dialogue problem — PRIME finds that abstracted preference summaries personalize better than re-retrieving past interactions, suggesting you may not need to keep asking if you remember well Does abstract preference knowledge outperform specific interaction recall?.
The synthesis, then: yes, personalized questions can improve open-domain chat, but only when the questions are individually well-formed Can models learn to ask genuinely useful clarifying questions?, drive genuine personalization rather than decoration Can conversations themselves personalize without user profiles?, and are balanced against proactivity, privacy, and memory so you ask the few questions that matter instead of interrogating the user Can user preferences be learned from just ten questions?. The deeper lesson the corpus leaves you with is that today's alignment recipe trains *against* this skill — so the gain isn't from bolting questions on, it's from undoing a tax we didn't know we were paying Does preference optimization harm conversational understanding?.
Sources 8 notes
Adding an intrinsic motivation reward for reducing uncertainty about user type during conversation enables personalization without pre-collected profiles. Tested in education and fitness domains with 20 user attributes, the approach balances helpfulness with strategic information gathering.
PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.
The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.
RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.
Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.
Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.
Longitudinal research shows personalization enhances trust and anthropomorphism but also amplifies privacy concerns and escalating user expectations. One-shot studies miss these temporal dynamics—each interaction raises the baseline, making failures more disappointing.
PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.