Do neural networks naturally break tasks into modular parts?
Can standard neural networks decompose complex tasks into separate subroutines implemented in distinct subnetworks, or do they only memorize input-output patterns? Understanding whether compositionality emerges from gradient-based learning matters for interpretability and generalization.
Structural compositionality is the extent to which neural networks break down compositional tasks into subroutines and implement them in modular subnetworks. The alternative: matching inputs to learned templates without task decomposition.
The evidence supports compositionality. Using model pruning to isolate subnetworks:
- Subnetworks that implement one subroutine can be identified
- Ablating a subnetwork harms its corresponding subroutine while leaving others largely intact
- This holds across multiple architectures (CNNs, transformers), tasks (vision, language), and scales
The pretraining effect: models initialized with pretrained weights more reliably produce modular subnetworks than randomly initialized models. Self-supervised pretraining appears to create internal structure that is more amenable to compositional decomposition. This suggests that the representations learned during pretraining have a modular quality that fine-tuning can exploit.
This provides empirical support against the longstanding objection that neural networks are fundamentally non-compositional. The finding: "some simple pseudo-symbolic computations might be learned directly from data using standard gradient-based optimization techniques." Explicit symbolic mechanisms may be unnecessary — gradient-based optimization discovers compositional structure when the task demands it and pretraining provides a good initialization.
The result is not perfect: "most do not exhibit perfect task decomposition." Compositionality is partial and graded, not all-or-nothing. Some architecture-task combinations show stronger structural compositionality than others.
This connects to the weight-sparsity finding: Can sparse weight training make neural networks interpretable by design? shows that enforcing sparsity produces clean decomposition. The structural compositionality paper shows that decomposition also emerges naturally, albeit imperfectly, from standard training. Sparsity amplifies a tendency that already exists.
Source: MechInterp
Related concepts in this collection
-
Can sparse weight training make neural networks interpretable by design?
Explores whether constraining most model weights to zero during training produces human-understandable circuits and disentangled representations, rather than attempting to reverse-engineer dense models after training.
sparsity amplifies the compositional decomposition that standard training already partially produces
-
Do base models already contain hidden reasoning ability?
Explores whether reasoning capability emerges during pre-training as a latent feature rather than being created by post-training methods like reinforcement learning or fine-tuning.
pretraining-induced modularity is part of the "latent capability" that minimal signals can activate
-
Can neural networks learn compositional skills without symbolic mechanisms?
Do neural networks need explicit symbolic architecture to compose learned concepts, or can scaling alone enable compositional generalization? This asks whether compositionality is an architectural feature or an emergent property of scale.
complementary evidence: scaling enables compositionality in behavior; pruning reveals it in structure
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
neural networks decompose compositional tasks into modular subnetworks without explicit symbolic mechanisms — pretraining encourages this