LLM Reasoning and Architecture

Can neural networks learn compositional skills without symbolic mechanisms?

Do neural networks need explicit symbolic architecture to compose learned concepts, or can scaling alone enable compositional generalization? This asks whether compositionality is an architectural feature or an emergent property of scale.

Note · 2026-02-23 · sourced from MechInterp
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

The question: do neural networks need explicit symbolic mechanisms to achieve compositionality, or does scaling suffice?

The answer: scaling data and model size leads to compositional generalization on standard MLPs, without architectural modifications — but with a critical condition: the training distribution must sufficiently cover the task space. Individual modules need not appear in isolation, but they must appear in enough combinations that the model can extract them.

Three key contributions:

  1. Proof of representational capacity. MLPs can approximate a general class of compositional task families (hyperteachers) to arbitrary precision using only a linear number of neurons relative to the number of task modules. Memorizing all tasks requires exponential capacity; the compositional solution is fundamentally more efficient.

  2. Linear decodability as a compositionality signature. When networks successfully compositionally generalize, the task constituents can be linearly decoded from hidden activations. This metric predicts failures in text-to-image models — when concepts cannot be linearly decoded, the model fails to compose them.

  3. Scaling limits. Despite progress, performance deteriorates as the number of composed concepts grows. The multiplicative nature of compositionality means even scaled models hit composition limits — the exponential growth eventually exceeds any finite training distribution.

This directly addresses Why do neural networks fail at compositional generalization?: the binding problem is solvable through scaling when training covers the task space, but remains unsolved for arbitrary novel compositions. The failure mode is not inability to learn compositional structure but insufficient exposure to the combinatorial space.

The practical implication for LLMs: compositional generalization in language (novel sentence structures, new concept combinations) should improve with scale — but the tails of the combinatorial space will always remain sparsely covered, predicting continued failures on truly novel compositions.

SKiC prompting: unlocking compositional generalization with few examples: Skills-in-Context (SKiC) prompting shows that compositional generalization can be unlocked with remarkably few examples — as few as two exemplars — when the prompt structure explicitly grounds each reasoning step on foundational skills. The SKiC prompt has three blocks: (1) skills with instructions, (2) compositional examples showing how to combine skills, (3) the problem. This one-stage approach achieves near-perfect systematic generalization and is more general than decomposition-based methods (handles complex computation graphs that cannot be linearly decomposed). Intriguingly, SKiC also unlocks "latent potential" — pre-existing internal skills from pretraining that standard prompting fails to activate. This confirms the training-coverage condition from a different angle: the model has compositional capacity from pretraining, but prompting must explicitly invoke the skill-grounding structure to surface it. Source: Prompts Prompting.


Source: MechInterp

Related concepts in this collection

Concept map
18 direct connections · 138 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

compositional generalization emerges from scaling data and model size without explicit symbolic mechanisms