Can LLMs learn to ask for feedback during problem solving?
Explores whether language models can be trained to actively solicit corrective feedback mid-conversation rather than committing to single-turn answers. This matters because it could bridge the gap between fluent chat and genuine conversational learning.
LLMs often struggle to learn from corrective feedback within a conversational context. They rarely proactively solicit feedback even when faced with ambiguity, and their dialogues feel static and one-sided compared to human conversation. Learning to Learn from Language Feedback with Social Meta-Learning takes inspiration from how children learn — through social meta-learning (SML), the process of learning how to learn from others — and operationalizes this as a finetuning methodology for LLMs.
The methodology converts static tasks into interactive social learning problems. A math problem, normally framed as "produce a solution," becomes a pedagogical dialogue: a "student" model attempts to generate the solution over the course of a conversation, and a "teacher" model provides guidance. The student is the model being trained. The teacher can be a frozen instance of the same model or a stronger model. Critically, the teacher has access to privileged information — the correct answer or a verifier's output — that creates an information asymmetry the student must learn to exploit.
The conversational reformulation does work that single-turn training cannot. It makes the student responsible for soliciting useful information from the teacher rather than producing a complete answer in one shot. It creates problems that are solvable through dialogue but unsolvable single-turn — exposing the model to challenges beyond its in-context capability and rewarding the conversation skill rather than the raw answer skill.
This is structurally distinct from standard supervised fine-tuning on multi-turn dialogues. SFT teaches the model to imitate dialogue patterns; SML teaches the model the meta-skill of using dialogue as a problem-solving resource. The difference shows up at test time: SFT-trained models reproduce conversational style; SML-trained models actively engage the conversation to extract information they need.
The implication for chat AI design: the gap between "fluent multi-turn responder" and "effective conversational learner" is bridged by training procedures that treat conversation as the learning environment rather than as the surface. Single-turn benchmarks select for the former; SML-style training selects for the latter.
Related concepts in this collection
-
Why does teacher-student information asymmetry enable learning signals?
What role does privileged answer access play in making social meta-learning training work? Without asymmetric information, can a conversation between teacher and student function as pedagogy or only as parallel speculation?
same paper, the mechanism that makes SML training informative
-
Can models learn to ask clarifying questions without explicit training?
Do language models trained only on fully-specified problems spontaneously develop the ability to ask for missing information when facing underspecified tasks? This tests whether conversational problem-solving strategies emerge from meta-learning rather than direct instruction.
same paper, the generalization payoff
-
Can structured argument prompts make LLM reasoning more rigorous?
Does requiring language models to explicitly check warrants, backing, and rebuttals—rather than reasoning freely—improve reasoning quality and catch failures that standard step-by-step prompting misses?
adjacent: another approach using structured questioning to improve reasoning
-
Why do models fail at asking good questions during interaction?
When models must actively seek information through questions rather than receive it passively, they struggle dramatically. This explores why GPT-4o plateaus at 35% accuracy and whether training or prompting can fix the underlying deficit.
adjacent: separates the problem-solving skill from the question-asking skill
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
social meta-learning teaches LLMs to learn from language feedback by converting static tasks into interactive pedagogical dialogues