SYNTHESIS NOTE
Conversational AI and Personalization Psychology, Society, and Alignment Reasoning, Retrieval, and Evaluation

Can models learn to ask better clarifying questions through self-improvement?

This explores whether question-asking is a trainable skill that improves when models are rewarded for questions that lead to better answers. It matters because asking good clarifying questions could help AI systems handle underspecified user requests.

Synthesis note · 2026-06-03 · sourced from Self Refinement Self Consistency Feedback

Users leave important aspects unsaid, and asking questions could resolve the ambiguity — but models ask poor questions. STaR-GATE applies self-improvement (STaR) to question-asking itself: generate a synthetic dataset of 25,500 persona-task prompts simulating a Questioner conversing with a Roleplayer whose preferences are hidden; the Questioner asks questions to elicit preferences, and is then iteratively finetuned on the questions that increased the probability of high-quality responses (responses generated by an Oracle with access to the Roleplayer's latent preferences). After two iterations of self-improvement, the Questioner asks better questions and produces responses preferred over the initial model on 72% of tasks.

The keeper is that eliciting preferences is a trainable skill, improvable by self-play against simulated users — reward the questions that lead to better downstream answers, and question-asking improves without human-written question supervision. It targets the elicitation half of personalization that prompt-stuffing and persona-assignment skip.

This is a strong fit for Adrian's clarification/proactivity thread. It pairs with Can models learn to ask clarifying questions without explicit training? (emergent vs explicitly-rewarded question-asking) and addresses the deficit named by Why can't advanced AI models take initiative in conversation? — STaR-GATE trains the initiative that passive next-turn optimization suppresses.

Inquiring lines that use this note as a source 2

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 100 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

teaching a model to ask clarifying questions by self-improving on questions that elicit hidden preferences beats the base model