Can textual gradients generalize natural language feedback across computation graphs?

This explores the TextGrad idea — treating natural language critique as a kind of 'gradient' that can be passed backward through a chain of components (prompts, tools, reasoning steps) the way numerical gradients flow through a neural net — and the corpus speaks to the premise more than the plumbing.

This explores whether natural-language feedback can play the role that numerical gradients play in ordinary learning — flowing backward through a multi-step system to tell each part how to improve. The corpus doesn't contain the specific 'textual gradient' framework, but it converges hard on the premise underneath it: that words carry information a scalar score throws away.

The strongest support is the finding that natural-language critique breaks through plateaus that more numbers can't move. When a model is stuck, a single reward number says 'wrong' but not *why*; a chain-of-thought critique says where the reasoning went off and how to fix it, and that's enough to recover correct solutions the numerical signal couldn't reach Can natural language feedback overcome numerical reward plateaus?. That's exactly the bet a textual gradient makes — that the *content* of feedback, not just its sign and magnitude, is the useful part.

The more interesting half of your question is the 'across computation graphs' part — can that feedback propagate through a multi-component system rather than just landing on one output? Here the corpus offers a working analogue: a self-play loop where the improvement signal literally *is* a natural-language edit. One role escalates difficulty, a neutral judge issues verdicts, and both sides evolve by rewriting their skills in plain language rather than by receiving a gradient Can language models learn skills without human supervision?. That's a textual-gradient-shaped idea in everything but name: language-as-update, passed between cooperating parts. You can also see the same instinct in approaches that manufacture their own dense, step-level feedback — tree search assigning credit to intermediate solution paths Can tree search replace human feedback in LLM training?, or models internalizing self-evaluation so the critique becomes part of the model rather than an external grader Can models learn to evaluate their own work during training?.

There's a sharp cautionary note worth knowing, though. The whole appeal of 'gradients' is the image of smooth, iterative descent — but research shows LLMs don't actually *execute* iterative numerical procedures; they pattern-match problems to memorized templates and emit plausible-looking values instead of truly optimizing Do large language models actually perform iterative optimization?. So a 'textual gradient' is a useful metaphor, not a literal one: the model isn't doing calculus on words. And there's a ceiling on what any feedback-in-language can buy you — if the underlying knowledge isn't in the model, no amount of clever prompting or critique injects it; feedback reorganizes what's there, it doesn't add what isn't Can prompt optimization teach models knowledge they lack?.

The thing you might not have known you wanted to know: the field is quietly building several *different* substitutes for the gradient — language critique, confidence-as-signal Can model confidence work as a reward signal for reasoning?, tree-search credit assignment, self-play edits — and they all share one motive, which is that a single reward number is too thin to teach a system *why* it failed. Textual gradients are one bet in that family; whether language can generalize *across* a computation graph is less a question of feasibility than of how reliably you can route a critique to the part that caused the error.

Sources 7 notes

Can natural language feedback overcome numerical reward plateaus?

Critique-GRPO shows that models stuck on performance plateaus can generate correct solutions when given chain-of-thought critiques, revealing that numerical rewards lack critical information about why failures occur and how to improve.

Can language models learn skills without human supervision?

Ctx2Skill's three-role self-play loop manufactures missing feedback through internal signals: the Challenger escalates difficulty as curriculum, the Judge gives binary verdicts as reward, and both sides evolve via natural-language skill edits. Success requires balancing adversarial pressure against a generalization safeguard to prevent collapse.

Can tree search replace human feedback in LLM training?

AlphaLLM uses tree search outcomes and three critic models to derive dense reward signals equivalent to human-labeled feedback. Tree structure naturally ranks solution paths by success, replacing the annotation oracle that standard RLHF requires.

Can models learn to evaluate their own work during training?

Post-Completion Learning exploits unused sequence space after model output to train self-assessment capabilities during training while maintaining zero inference cost. The model learns to compute its own reward functions, internalizing evaluation rather than relying on external reward models.

Do large language models actually perform iterative optimization?

Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Can model confidence work as a reward signal for reasoning?

RLSF uses answer-span confidence to rank reasoning traces, creating synthetic preferences that strengthen step-by-step reasoning while reversing RLHF's calibration degradation—without requiring human labels or external verifiers.

Can textual gradients generalize natural language feedback across computation graphs?

Sources 7 notes

Next inquiring lines