On the Binding Problem in Artificial Neural Networks

Paper · arXiv 2012.05208 · Published December 9, 2020

In this work, we argue that this underlying cause is the binding problem: The inability of existing neural networks to dynamically and flexibly bind information that is distributed throughout the network. The binding problem affects their ability to form meaningful entities from unstructured sensory inputs (segregation), to maintain this separation of information at a representational level (representation), and to use these entities to construct new inferences, predictions, and behaviors (composition). Each of these aspects relates to a wealth of research in neuroscience and cognitive psychology, where the binding problem has been extensively studied in the context of the human brain. Based on these connections, we work towards a solution to the binding problem in neural networks and identify several important challenges and requirements.

We start our discussion by reviewing the importance of symbols as units of computation and highlight several symptoms that point to the lack of emergent symbolic processing in existing neural networks. We argue that this is a major obstacle for achieving human-level generalization, and posit that the binding problem in connectionism is the underlying cause for this weakness. This section serves as an introduction to the binding problem and provides the necessary context for the subsequent in-depth discussion of its individual aspects in Sections 3 to 5.

Connectionism takes a different, brain-inspired, approach to Artificial Intelligence that stands in contrast to symbolic AI and its focus on the conscious mind (Newell and Simon, 1981; Fodor, 1975). Rather than relying on hand-crafted symbols and rules, connectionist approaches such as neural networks focus on learning suitable distributed representations directly from low-level sensory data. In this way, neural networks have resolved many of the problems that haunted symbolic AI, including their brittleness when confronted with inconsistencies or noise, and the prohibitive amount of human engineering and interpretation that would be required to apply these techniques on low-level perceptual tasks. Importantly, the distributed representations learned by neural networks are directly grounded in their input data, unlike symbols whose connection to real-world concepts is entirely subject to human interpretation (see symbol grounding problem; Harnad, 1990).

it is found that agents are fragile under distributional shift (Kansky et al., 2017; Zhang et al., 2018; Gamrian and Goldberg, 2019) and require substantially more training data than humans (Tsividis et al., 2017). These failures at systematically reusing knowledge suggest that neural networks do not learn a compositional knowledge representation (although some mitigation is possible (Hill et al., 2019, 2020)). In some cases, such as in vision, it may appear that object-level abstractions can emerge naturally as a byproduct of learning (Zhou et al., 2015).