How do transformers generate harder solutions when mostly trained on easier problems?
This explores how models trained mostly on easy examples can produce solutions to harder ones — the mechanics of going beyond the training distribution, and where that breaks down.
This explores how a model trained mostly on easy problems can produce harder solutions — what actually lets it climb past its training, and where the climb stalls. The corpus has two camps on this, and the tension between them is the interesting part. The optimistic camp says transformers can bootstrap themselves upward. The clearest case is addition: a standard transformer trained only on short sums generalizes from 10-digit to 100-digit arithmetic by generating its own solutions, keeping only the ones it can verify as correct, and retraining on them — and crucially the gains are exponential across rounds, not a slow linear creep Can transformers improve exponentially by learning from their own correct solutions?. The easy problems are a launchpad: each round of self-filtered output becomes slightly-harder training data for the next. A related route is architectural — looping a transformer's layers with shared parameters lets it extrapolate to deeper, unseen combinations that a fixed-depth model can't, by effectively running 'more steps' of the same learned operation on a harder instance Can looped transformers generalize to unseen knowledge combinations?.
A third mechanism doesn't change weights at all: an RL-tuned model can solve unseen problems inside a single context window, adapting from its own attempts within the episode the way a person learns from a few tries Can transformers learn to solve new problems within episodes?. So 'harder solutions' can come from retraining on filtered output, from looping depth, or from in-context adaptation — three different doors to the same room.
But the skeptical camp warns that a lot of apparent 'harder solving' is recombination of easy pieces rather than genuinely new reasoning. Several notes find that transformers tend to reduce reasoning to matching memorized computation patterns, succeeding when a hard problem decomposes into familiar sub-pieces and failing sharply on truly novel compositions, with errors compounding as the chain lengthens Do transformers actually learn systematic compositional reasoning?. The multi-hop work sharpens this: cross-distribution reasoning only emerges after distinct training phases, and the later hops generalize only if the model saw compositional examples during training — pure easy-only exposure isn't always enough How do transformers learn to reason across multiple steps?. And models often learn brittle task-specific shortcuts rather than a unified world model, which is exactly why they crack on inputs that look harder in an unfamiliar way foundation-models-develop-task-specific-heuristics-rather-than-task-generalizable.
The wrinkle worth taking away: there's evidence that 'harder' and 'more effort' aren't reliably coupled in these models. Reasoning-trace length tracks how close a problem sits to the training distribution, not how genuinely difficult it is — out-of-distribution, that coupling breaks entirely Does longer reasoning actually mean harder problems?. And models can actually detect a question's difficulty in their hidden states before reasoning, yet fail to act on that signal — over-thinking easy ones and under-committing on hard ones Can models recognize question difficulty before they reason?. So the honest synthesis is: transformers reach harder solutions mainly by self-bootstrapping on verified-correct output, by recurrent depth, or by in-context adaptation — but whether that's real generalization or clever recombination depends heavily on whether the hard problem decomposes into patterns the easy training already taught.
Sources 8 notes
Standard transformers generalize from 10-digit to 100-digit addition by repeatedly generating solutions, filtering for correctness, and retraining—showing exponential (not linear) out-of-distribution improvement across rounds without saturation.
Recurrent-depth transformers with shared parameters across iterations enable systematic generalization and depth extrapolation that vanilla transformers cannot achieve. This emerges through a sharp three-phase process: memorization, in-distribution, then out-of-distribution generalization.
Llama 3.1 8B fine-tuned with RL exhibits emergent in-context reinforcement learning, solving unseen problems through within-episode adaptation at human-level sample efficiency. This meta-learning emerges from RL's training pressure combined with the transformer's context window, without weight updates.
Research shows transformers succeed on in-distribution tasks by memorizing computation subgraphs from training data, not by learning systematic rules. They fail drastically on novel compositions, with errors compounding across reasoning steps.
Controlled training reveals transformers learn multi-hop reasoning in three phases: memorization, in-distribution generalization, and cross-distribution reasoning. Successful reasoning correlates with cosine clustering of entity representations, and second-hop generalization requires explicit compositional exposure during training.
Inductive bias probes show transformers trained on orbital mechanics and games learn predictive patterns, not unified world structure. Fine-tuning reveals nonsensical, slice-dependent laws; circuit analysis shows arithmetic relies on range-matching heuristics, not algorithms.
Controlled A* maze experiments show trace length correlates with difficulty only in-distribution but decouples entirely out-of-distribution. Trace length primarily reflects recall of training schemas, not adaptive computation.
Linear probes successfully decode difficulty from LRM representations before reasoning begins, yet models still overthink simple questions. This reveals an action-commitment failure rather than a perception failure.