LLM Reasoning and Architecture

Do neural networks naturally break tasks into modular parts?

Can standard neural networks decompose complex tasks into separate subroutines implemented in distinct subnetworks, or do they only memorize input-output patterns? Understanding whether compositionality emerges from gradient-based learning matters for interpretability and generalization.

Note · 2026-02-23 · sourced from MechInterp
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

Structural compositionality is the extent to which neural networks break down compositional tasks into subroutines and implement them in modular subnetworks. The alternative: matching inputs to learned templates without task decomposition.

The evidence supports compositionality. Using model pruning to isolate subnetworks:

The pretraining effect: models initialized with pretrained weights more reliably produce modular subnetworks than randomly initialized models. Self-supervised pretraining appears to create internal structure that is more amenable to compositional decomposition. This suggests that the representations learned during pretraining have a modular quality that fine-tuning can exploit.

This provides empirical support against the longstanding objection that neural networks are fundamentally non-compositional. The finding: "some simple pseudo-symbolic computations might be learned directly from data using standard gradient-based optimization techniques." Explicit symbolic mechanisms may be unnecessary — gradient-based optimization discovers compositional structure when the task demands it and pretraining provides a good initialization.

The result is not perfect: "most do not exhibit perfect task decomposition." Compositionality is partial and graded, not all-or-nothing. Some architecture-task combinations show stronger structural compositionality than others.

This connects to the weight-sparsity finding: Can sparse weight training make neural networks interpretable by design? shows that enforcing sparsity produces clean decomposition. The structural compositionality paper shows that decomposition also emerges naturally, albeit imperfectly, from standard training. Sparsity amplifies a tendency that already exists.


Source: MechInterp

Related concepts in this collection

Concept map
13 direct connections · 96 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

neural networks decompose compositional tasks into modular subnetworks without explicit symbolic mechanisms — pretraining encourages this