Break It Down: Evidence for Structural Compositionality in Neural Networks

Paper · arXiv 2301.10884 · Published January 26, 2023
MechInterpCognitive Models Latent

Though modern neural networks have achieved impressive performance in both vision and language tasks, we know little about the functions that they implement. One possibility is that neural networks implicitly break down complex tasks into subroutines, implement modular solutions to these subroutines, and compose them into an overall solution to a task — a property we term structural compositionality. Another possibility is that they may simply learn to match new inputs to learned templates, eliding task decomposition entirely. Here, we leverage model pruning techniques to investigate this question in both vision and language across a variety of architectures, tasks, and pretraining regimens. Our results demonstrate that models often implement solutions to subroutines via modular subnetworks, which can be ablated while maintaining the functionality of other subnetworks. This suggests that neural networks may be able to learn compositionality, obviating the need for specialized symbolic mechanisms.

Though neural networks have come to dominate most subfields of AI, much remains unknown about the functions that they learn to implement. In particular, there is debate over the role of compositionality. Compositionality has long been touted as a key property of human cognition, enabling humans to exhibit flexible and abstract language processing and visual processing, among other cognitive processes (Marcus, 2003; Piantadosi et al., 2016; Lake et al., 2017; Smolensky et al., 2022). According to common definitions (Quilty-Dunn et al., 2022; Fodor & Lepore, 2002), a representational system is compositional if it implements a set of discrete constituent functions that exhibit some degree of modularity. That is, blue circle is represented compositionally if a system is able to entertain the concept blue independently of circle, and vice-versa.

It is an open question whether neural networks require explicit symbolic mechanisms to implement compositional solutions, or whether they implicitly learn to implement compositional solutions during training. Historically, artificial neural networks have been considered non-compositional systems, instead solving tasks by matching new inputs to learned templates (Marcus, 2003; Quilty-Dunn et al., 2022). Neural networks’ apparent lack of compositionality has served as a key point in favor of integrating explicit symbolic mechanisms into contemporary artificial intelligence systems (Andreas et al., 2016; Koh et al., 2020; Ellis et al., 2023; Lake et al., 2017). However, modern neural networks, with no explicit inductive bias towards compositionality, have demonstrated successes on increasingly complex tasks. This raises the question: are these models succeeding by implementing compositional solutions under the hood (Mandelbaum et al., 2022)?

  1. We introduce the concept of structural compositionality, which characterizes the extent to which neural networks decompose compositional tasks into subroutines and implement them modularly. We test for structural compositionality in several different models across both language and vision2.

  2. We discover that, surprisingly, there is substantial evidence that many models implement subroutines in modular subnetworks, though most do not exhibit perfect task decomposition.

  3. We characterize the effect of unsupervised pretraining on structural compositionality in fine-tuned networks and find that pretraining leads to a more consistently compositional structure in language models.

This study contributes to the emerging body of work on “mechanistic interpretability” (Olah, 2022; Cammarata et al., 2020; Ganguli et al., 2021; Henighan et al., 2023) which seeks to explain the algorithms that neural networks implicitly implement within their weights. We make use of techniques from model pruning in order to gain insight into these algorithms. While earlier versions of these techniques have been applied to study modularity in a multitask setting (Csordás et al., 2021), our work is novel in that it applies the method to more complex language and vision models, studies more complex compositional tasks, and connects the results to a broader discussion about defining and measuring compositionality within neural networks.

As discussed above, leading definitions of compositionality are defined in terms of a system’s representations, not its behavior. That is, definitions contrast compositional systems (which implement modular constituents) with noncompositional systems (which might, e.g., rely on learned templates). Poor performance on generalization studies does not differentiate these two types of systems, since even a definitionally compositional system might fail at these generalization tasks. For example, a Bayesian network that explicitly represents and composes distinct shape and color properties might nonetheless classify a blue circle as a red circle if it has a low prior for predicting the color blue and a high prior for predicting the color red.

Thus, in this work, we focus on evaluating the extent to which a model’s representations are structured compositionally. Consider the task described in Figure 1. In this task, a network learns to select the “odd-one-out” among four images. Three of them follow a compositional rule (they all contain two shapes, one of which is inside and in contact with the other). One of them breaks this rule. There are at least two ways that a network might learn to solve this type of compositional task. (1) A network might compare new inputs to prototypes or iconic representations of previously-seen inputs, avoiding any decomposition of these prototypes into constituent parts (i.e., it might implement a non-compositional solution). (2) A network might implicitly break the task down into subroutines, implement solutions to each, and compose these results into a solution (i.e., it might implement a compositional solution). In this case, the subroutines consist of a (+/- Inside) detector and a (+/- Contact) detector.

If a model trained on this task exhibits structural compositionality, then we would expect to find a subnetwork that implements each subroutine within the parameters of that model. This subnetwork should compute one subroutine, and not the other (Figure 1, Bottom Right; “Subnetwork”), and it should be modular with respect to the rest of the network — it should be possible to ablate this subnetwork, harming the model’s ability to compute one subroutine while leaving the other subroutine largely intact (Figure 1, Bottom Right; “Ablation”). However, if a model does not exhibit structural compositionality, then it has only learned the conjunction of the subroutines rather than their composition. It should not be possible to find a subnetwork that implements one subroutine and not the other, and ablating one subnetwork should hurt accuracy on both subroutines equally (Figure 1, Bottom Center). This definition is related to prior work on modularity in neural networks (Csordás et al., 2021; Hod et al., 2022), but here we specifically focus on modular representations of compositional tasks.

For some architecture/task combinations, the pattern of ablated model results is statistically significantly in favor of structural compositionality. See Figure 4 (C, D), where all base models seem to implement both subroutines in a modular fashion. We

Across all language tasks, the ablation results indicate that models initialized with pretrained weights more reliably produce modular subnetworks than randomly initialized models.

Across a variety of architectures, tasks, and training regimens, we demonstrated that models often exhibit structural compositionality. Without any explicit encouragement to do so, neural networks appear to decompose tasks into subroutines and implement solutions to (at least some of) these subroutines in modular subnetworks. Furthermore, we demonstrate that self-supervised pretraining can lead to more consistent structural compositionality, at least in the domain of language. These results bear on the longstanding debate over the need for explicit symbolic mechanisms in AI systems. Much work is focusing on integrating symbolic and neural systems (Ellis et al., 2023; Nye et al., 2020). However, our results suggest that some simple pseudo-symbolic computations might be learned directly from data using standard gradient-based optimization techniques.

We view our approach as a tool for understanding when and how compositionality arises in neural networks, and plan to further investigate the conditions that encourage structural compositionality. One promising direction would be to investigate the relationship between structural compositionality and recent theoretical work on compositionality and sparse neural networks (Mhaskar & Poggio, 2016; Poggio, 2022). Specifically, this theoretical work suggests that neural networks optimized to solve compositional tasks naturally implement sparse solutions. This may serve as a starting point for developing a formal theory of structural compositionality in neural networks. Another direction might be to investigate the structural compositionality of networks trained using iterated learning procedures (Ren et al., 2019; Vani et al., 2020). Iterated learning simulates the cultural evolution of language by jointly training two communicating agents (Kirby et al., 2008). Prior work has demonstrated that iterated learning paradigms give rise to simple compositional languages. Quantifying the relationship between structural compositionality within the agents and the compositionality of the language that they produce would be an exciting avenue for understanding the relationship between representation and behavior.