Reinforcement Learning for LLMs LLM Reasoning and Architecture

Can energy minimization unlock reasoning without domain-specific training?

Can a gradient descent-based architecture achieve system 2 thinking across any modality or problem type using only unsupervised learning, without verifiers or reasoning-specific rewards?

Note · 2026-02-23 · sourced from Novel Architectures

Energy-Based Transformers (EBTs) represent a fundamentally different approach to inference-time scaling. Rather than generating tokens sequentially, EBTs train to assign an energy value (unnormalized probability) to every input and candidate-prediction pair. Prediction is then reframed as gradient descent-based energy minimization until convergence — the model iteratively refines its prediction by descending the energy landscape.

This formulation enables System 2 Thinking to emerge from unsupervised learning without any of the domain-specific scaffolding that current approaches require:

The scaling results are striking:

The deeper implication: current test-time scaling approaches are constrained by their dependence on either (a) verbalized reasoning chains requiring domain-specific training data, or (b) verifiable reward signals for RL-based approaches. EBTs bypass both constraints by making "thinking harder" an inherent property of the architecture — more gradient descent iterations at inference = more thinking, with the model's own energy function as the implicit verifier.

This challenges the implicit assumption in Can non-reasoning models catch up with more compute? — EBTs are not "reasoning models" in the RL-trained sense, yet they scale with inference compute because the energy minimization framework is itself a form of iterative refinement that doesn't require explicit reasoning traces.


Source: Novel Architectures

Related concepts in this collection

Concept map
15 direct connections · 151 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

energy-based transformers achieve system 2 thinking from unsupervised learning alone — modality and problem agnostic