Do large language models actually perform iterative optimization?
Explores whether LLMs execute genuine numerical procedures like Newton-Raphson or instead pattern-match to memorized solution templates when solving constrained optimization problems.
The constraint-optimization study identifies the mechanism behind the 55-60% plateau directly. LLMs cannot actually perform Newton-Raphson iterations in their latent space. They cannot execute primal-dual updates, nor any other iterative numerical procedure that genuine optimization requires. When asked to do so, they fall back to what the paper calls "result guessing" — recognizing the problem as similar to a standard power grid (or financial dataset, or security scenario) and emitting values that pattern-match what a valid solution should look like.
The fallback is silent. The output is fluent, well-formatted, often plausible. It can pass surface-level inspection because the model has seen many examples of what answers in this domain look like. What it has not done is solve the problem. The constraint values are wrong in ways that physical or financial systems would actually reject.
This explains why scale, architecture, and training regime do not move the plateau. They improve the template but not the procedure. A larger model has seen more example solutions and can produce more convincing guesses. Reinforcement learning on outcome rewards reinforces the template-matching pattern. None of this installs the iterative-computation capability the problem requires.
The mechanism — pattern-match against memorized solution-shapes when genuine computation is required — generalizes beyond optimization. It is plausibly the same mechanism behind a class of mathematical-reasoning failures where models produce confidently wrong numerical answers that resemble the right shape. The category is "looks like a solution; is not derived from one."
Related concepts in this collection
-
Do larger language models solve constrained optimization better?
Explores whether scaling LLMs—through more parameters, better training, or reasoning extensions—improves their ability to satisfy constraints in real optimization problems like power grids and portfolios.
same paper, the plateau this mechanism explains
-
Do reasoning models actually beat standard models on optimization?
Explores whether extended chain-of-thought in reasoning models delivers performance gains on constraint-satisfaction problems like power-grid optimization. Matters because reasoning models are treated as automatic upgrades, but the evidence may not support that claim.
same paper, why extended CoT does not fix it
-
Do fine-tuned language models actually learn optimization procedures?
Can RL fine-tuning teach LLMs to solve constraint-optimization problems through genuine reasoning, or does it merely sharpen pattern-matching? Testing on out-of-distribution variants reveals the mechanism.
same paper, the diagnostic that exposes the memorization
-
Does chain-of-thought reasoning reveal genuine inference or pattern matching?
Explores whether CoT instructions unlock real reasoning capabilities or simply constrain models to mimic familiar reasoning patterns from training data. This matters for understanding whether language models can actually reason abstractly.
adjacent: imitation vs computation
-
What do models actually learn from chain-of-thought training?
When models train on reasoning demonstrations, do they memorize content details or absorb reasoning structure? Testing with corrupted data reveals which aspects of CoT samples actually drive learning.
adjacent: form-over-content failure mode
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
LLMs cannot execute iterative numerical methods in latent space and fall back to result guessing against memorized templates