Reasoning and Learning Architectures Reasoning and Knowledge

Do larger language models solve constrained optimization better?

Explores whether scaling LLMs—through more parameters, better training, or reasoning extensions—improves their ability to satisfy constraints in real optimization problems like power grids and portfolios.

Note · 2026-05-18 · sourced from Reasoning Architectures

When evaluated on real constrained-optimization problems — optimal power flow, financial portfolio constraints, cyber-security feasibility — LLMs cluster around 55-60% constraint satisfaction across virtually all conditions tested. The plateau is robust to changes in architecture, parameter count, and training regime. Reasoning models, despite extended chain-of-thought, do not systematically beat their non-reasoning counterparts on these tasks.

The flatness of the plateau is the finding. Most LLM capability work assumes that the relevant axis is performance vs scale, and that closing a gap is a matter of training on more or better data. Constrained optimization does not behave that way. The benchmark distinguishes problems that require jointly interpreting structured input, doing multi-step arithmetic, satisfying interacting physical constraints, and converging to feasible solutions. On the joint task, the model class itself appears to be near a ceiling.

This is distinct from general reasoning benchmarks (MMLU, GPQA) and from logical reasoning benchmarks (ARC-AGI, SATBench, ZebraLogic). Those measure either broad knowledge or synthetic constraint puzzles. Real engineering optimization requires the model to execute iterative numerical procedures over physical constraints, and that procedural execution is where the plateau lives.

The deployment implication is sharp: telling executives that "LLMs will optimize the grid" or "LLMs will solve constrained portfolio problems" is currently an overclaim. The same finding suggests the productive direction is not "wait for the next model" but "change the paradigm" — restrict the LLM to abstraction tasks and hand numeric work to solvers.

Related concepts in this collection

Concept map
14 direct connections · 127 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

LLMs plateau at 55 to 60 percent constraint satisfaction on genuine optimization regardless of scale architecture or training