Conversational AI Systems

Can LLMs learn to ask for feedback during problem solving?

Explores whether language models can be trained to actively solicit corrective feedback mid-conversation rather than committing to single-turn answers. This matters because it could bridge the gap between fluent chat and genuine conversational learning.

Note · 2026-05-18 · sourced from Training Fine Tuning

LLMs often struggle to learn from corrective feedback within a conversational context. They rarely proactively solicit feedback even when faced with ambiguity, and their dialogues feel static and one-sided compared to human conversation. Learning to Learn from Language Feedback with Social Meta-Learning takes inspiration from how children learn — through social meta-learning (SML), the process of learning how to learn from others — and operationalizes this as a finetuning methodology for LLMs.

The methodology converts static tasks into interactive social learning problems. A math problem, normally framed as "produce a solution," becomes a pedagogical dialogue: a "student" model attempts to generate the solution over the course of a conversation, and a "teacher" model provides guidance. The student is the model being trained. The teacher can be a frozen instance of the same model or a stronger model. Critically, the teacher has access to privileged information — the correct answer or a verifier's output — that creates an information asymmetry the student must learn to exploit.

The conversational reformulation does work that single-turn training cannot. It makes the student responsible for soliciting useful information from the teacher rather than producing a complete answer in one shot. It creates problems that are solvable through dialogue but unsolvable single-turn — exposing the model to challenges beyond its in-context capability and rewarding the conversation skill rather than the raw answer skill.

This is structurally distinct from standard supervised fine-tuning on multi-turn dialogues. SFT teaches the model to imitate dialogue patterns; SML teaches the model the meta-skill of using dialogue as a problem-solving resource. The difference shows up at test time: SFT-trained models reproduce conversational style; SML-trained models actively engage the conversation to extract information they need.

The implication for chat AI design: the gap between "fluent multi-turn responder" and "effective conversational learner" is bridged by training procedures that treat conversation as the learning environment rather than as the surface. Single-turn benchmarks select for the former; SML-style training selects for the latter.

Related concepts in this collection

Concept map
14 direct connections · 133 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

social meta-learning teaches LLMs to learn from language feedback by converting static tasks into interactive pedagogical dialogues