Representation biases: will we achieve complete understanding by analyzing representations?

Paper · arXiv 2507.22216 · Published July 29, 2025

A common approach in neuroscience is to study neural representations as a means to understand a system—increasingly, by relating the neural representations to the internal representations learned by computational models. However, a recent work in machine learning (Lampinen et al., 2024) shows that learned feature representations may be biased to over-represent certain features, and represent others more weakly and less-consistently. For example, simple (linear) features may be more strongly and more consistently represented than complex (highly nonlinear) features. These biases could pose challenges for achieving full understanding of a system through representational analysis. In this perspective, we illustrate these challenges—showing how feature representation biases can lead to strongly biased inferences from common analyses like PCA, regression, and RSA. We also present homomorphic encryption as a simple case study of the potential for strong dissociation between patterns of representation and computation. We discuss the implications of these results for representational comparisons between systems, and for neuroscience more generally.

A central approach of neuroscience is analyzing patterns of neural representation to learn about a system (Kriegeskorte and Diedrichsen, 2019). In particular, computational neuroscience has increasingly relied on relating patterns of neural activity to the internal representational structures of computational models (Churchland and Sejnowski, 1990; Kriegeskorte et al., 2008; Sucholutsky et al., 2023; Feather et al., 2025). However, there are philosophical questions about how to justify interpreting internal activity as representations (Shea, 2018; Fallon et al., 2023; Cao, 2022b) and conceptual and practical challenges to understanding a system through analyzing its internal activity or representations (Poldrack, 2006; Marom et al., 2009; Jonas and Kording, 2017; Ritchie et al., 2019; Sexton and Love, 2022; Dujmovic et al., 2024).

Here, we use case studies from recent work in machine learning to illustrate some practical challenges to understanding a system’s function by studying its internal representations (see Fig. 1 for an overview)—and discuss the implications for neuroscience. In particular, we focus on the results of Lampinen et al. (2024). In this work, the authors study the relationship between patterns of representation and computation in machine learning models, using controlled experiments. The authors identify substantial biases in the learned representations: some features are much more strongly represented than others, even if they play similar computational roles in the system’s behavior. These representation biases mean that common analytic methods—such as Principal Component Analysis (PCA), Representational Similarity Analysis (RSA; Kriegeskorte et al., 2008), and linear regression—may be biased towards capturing some computational features over others. Thus, the many types of neuroscience experiments that use these methods like these to study a system’s representations or activity may provide a biased picture of its computations.