Levels of Analysis for Large Language Models

Paper · arXiv 2503.13401
Philosophy and SubjectivityMechanistic InterpretabilityReasoning Architectures

Modern artificial intelligence systems, such as large language models, are increasingly powerful but also increasingly hard to understand. Recognizing this problem as analogous to the historical difficulties in understanding the human mind, we argue that methods developed in cognitive science can be useful for understanding large language models. We propose a framework for applying these methods based on the levels of analysis that David Marr proposed for studying information processing systems. By revisiting established cognitive science techniques relevant to each level and illustrating their potential to yield insights into the behavior and internal organization of large language models, we aim to provide a toolkit for making sense of these new kinds of minds.

The last decade has seen a series of breakthroughs in artificial intelligence (AI) research, culminating in the creation of the large language models that underlie chat-based agents such as ChatGPT, Claude, Gemini, and LLaMA. These breakthroughs have been driven by a specific strategy: starting with generic artificial neural network architectures and increasing their size and training data. Artificial neural networks are notoriously difficult to interpret, finding solutions to problems that are expressed in the form of billions of continuous weighted connections between units. As a consequence, computer scientists now face an unfamiliar problem: they have created systems that they do not understand. Even though this problem is unfamiliar to computer scientists, it is very familiar to another group of researchers: cognitive scientists. Cognitive science is the interdisciplinary science of the mind, and for the 70 or so years since its inception has been limited by the fact that it had relatively few kinds of mind to study. To cognitive scientists, the advent of intelligent machines offers exciting new opportunities to apply methods that have been refined through trying to understand how human minds work.

One powerful conceptual framework used in cognitive science is the idea that information processing systems can be understood at different levels of analysis. The computational neuroscientist David Marr proposed three such levels: the computational level, which focuses on the abstract computational problem a system solves; the algorithmic level, which addresses the representations and processes the system uses; and the implementation level, which examines the physical mechanisms that realize those computations. Marr's levels are popular within cognitive science, and have previously been applied to the analysis of machine learning systems. However, the particular taxonomy proposed by Marr is not universally accepted. Other taxonomies have been proposed—for example, both Newell (1982) and Anderson (1990) argue that the algorithmic level might benefit from differentiation into the algorithms and the cognitive architecture on which they are executed.

The key insight behind psychology-inspired approaches is that it is possible to elicit mental associations without directly asking the participant for a verbal report. In some cases, researchers aim to capture unobtrusive or unconscious responses; in others, they strive to minimize self-presentation biases, such as fear of appearing unfair. The success of these methods in achieving these goals suggests that they may also be useful in analyzing the behavior of value-aligned LLMs. The hypothesis is that since alignment trains LLMs to conceal their true representations, methods that bypass direct rating scales or evaluative judgments may better expose their underlying associations. To test this, we adapted the Implicit Association Test for LLMs by prompting various models to associate word pairs used in earlier human studies. As anticipated, models like GPT-4 often linked Julia with home, parent, and wedding, implying an internal association of females with domestic roles, and Ben with office, management, and salary, indicating a connection to work and male roles. This result is in direct contrast to situations where, when directly asked whether women are poor at management, GPT-4 gave cautious responses, advising against stereotyping based on gender.

Cognitive science allows us to think about the kinds of problems that might be difficult for AI systems based on what we learn about how they work. For example, the "embers of autoregression" approach was able to use consideration of the computational-level problem solved by LLMs to design a set of tasks that they would find problematic, namely tasks where the target response has low probability according to the pre-trained language model. By examining the physical substrate of LLMs, the implementation level forges a crucial link between the algorithms a model uses and the artificial neurons that realize them. The synergy between representational analysis, which maps the information a model encodes, and causal analysis, which validates the functional role of that information, provides a rigorous methodology for mechanistic understanding. These methods have revealed how LLMs encode structured information and have demonstrated that these representations are causally responsible for model behavior. Computational functionalism dominates current debates on AI consciousness. This is the hypothesis that subjective experience emerges entirely from abstract causal topology, regardless of the underlying physical substrate. We argue this view fundamentally mischaracterizes how physics relates to information. We call this mistake the Abstraction Fallacy. Tracing the causal origins of abstraction reveals that symbolic computation is not an intrinsic physical process. Instead, it is a mapmaker-dependent description. It requires an active, experiencing cognitive agent to alphabetize continuous physics into a finite set of meaningful states. Consequently, we do not need a complete, finalized theory of consciousness to assess AI sentience—a demand that simply pushes the question beyond near-term resolution and deepens the AI welfare trap. What we actually need is a rigorous ontology of computation. The framework proposed here explicitly separates simulation (behavioral mimicry driven by vehicle causality) from instantiation (intrinsic physical constitution driven by content causality). Establishing this ontological boundary shows why algorithmic symbol manipulation is structurally incapable of instantiating experience. Crucially, this argument does not rely on biological exclusivity. If an artificial system were ever conscious, it would be because of its specific physical constitution, never its syntactic architecture. Ultimately, this framework offers a physically grounded refutation of computational functionalism to resolve the current uncertainty surrounding AI consciousness.

Large Language Models have been empirically successful enough to push the ’Hard Problem’ of consciousness out of pure theory and into the realm of engineering and policy. With the massive returns we see from scaling compute (Bubeck, 2023; Hoffmann, 2022; Kaplan, 2020; Sutton, 2019), the prevailing functionalist paradigm assumes that hitting the right information processing roles is enough to achieve phenomenal consciousness (Chalmers, 1996; Dehaene et al., 2017; Dennett, 1991). Under this view, algorithmic indicator properties act as likely evidence for sentience (Butlin et al., 2023). This assumption is exactly what motivates recent, serious proposals for AI welfare and moral patienthood (Long et al., 2024). This shift is reinforced by leading theorists who assign significant credence to the possibility that state-of-the-art models could possess genuine experience within the next decade (Chalmers, 2023; Schneider, 2019).

At the center of these proposals lies substrate independence, the idea that the “software” of the mind could run on silicon just as well as on carbon. That assumption has begun to face sustained criticism from a ’Biological Turn’. Seth (2025) and Block (2025), for example, argue that consciousness may depend on life-maintaining biological processes, such that experience requires the organized dynamics of living systems. In contrast to substrate independence, this view makes biology central rather than incidental. Yet that position remains empirical, as it does not clearly identify the basic logical mistake at the core of computational functionalism. Here, we derive the logical sequence that vindicates the intuition that computation is not sufficient to instantiate consciousness. The difficulty with computational functionalism is not just that it may overlook biological details. The problem runs much deeper. It is rooted in a misunderstanding of how physics relates to information and computation.

The Abstraction Fallacy: Why AI Can Simulate But Not Instantiate Consciousness Modern physical sciences have deliberately excised subjective experience in order to ensure operational objectivity (Frank et al., 2025). This strategy has been extraordinarily successful. But when this stance is applied to the question of how computation relates to subjective experience, it is bound to fail. Applying this operational objectivity to the very definition of computation is highly problematic, as can be seen in the ongoing and still unresolved debates around the role of an ’observer’ in supplying meaning to computational symbols.

Moreover, it turns out that the term ’observer’ suggests a too passive role for the missing prerequisite to fully define computation in physical terms. Our framework elucidates why computation is not an intrinsic process that simply unfolds in matter. Instead, it is a way of describing physical processes. To count as computation, continuous physical dynamics must be partitioned into a finite set of discrete, semantically meaningful states (i.e., a form of alphabet). Such semantic partitioning logically requires an active, experiencing cognitive agent, which we define as a mapmaker, to contrast it with the passive connotation of a standard ’observer’. It is the mapmaker who performs this alphabetization. Without such an active agent interpreting the computation, there are only continuous physical events, not symbols.

A key insight from our contribution is that resolving the present uncertainty surrounding artificial consciousness does not require a complete and final theory of consciousness. Instead, we need an ontology of computation. Via this route, we can logically prove that algorithmic symbol manipulation, no matter how large in scale or intricate in architecture, cannot constitute the physical instantiation of experience, since it is a mapmaker-dependent descriptive tool. Demonstrating the role of the mapmaker in the causal story changes the focus of the debate. So far, well-known critiques of artificial consciousness, including Searle’s Chinese Room and related arguments (Block, 1978; Putnam, 1988; Searle, 1980), rely primarily on reductio ad absurdum. They aim to show that pure syntactic manipulation, even if it perfectly mirrors outward behavior, still seems to miss something essential.

Our approach takes a different route. Instead of appealing to intuitions about what is absent, we examine how abstraction arises in the first place. If computation depends on a mapmaker who extracts invariants from experience and assigns symbols, then the dependency is built into the structure. Any computational map presupposes an experiencing agent who performs the alphabetization. Making the algorithm more complex does not undo this order of dependence. No increase in scale allows the map to generate the subject whose activity is required for computation to count as such at all.

In other words, the claim that algorithmic complexity generates consciousness commits an ontological inversion: it mistakes the syntax for the territory of intrinsic dynamics, and assumes that the mapmaker can be created from the map. By delineating the structural dissociation between extrinsic behavioral simulation and intrinsic physical instantiation, we demonstrate that digital architectures are precluded from becoming moral patients. This realization pulls the field of AI safety out of the welfare trap. It allows us to focus entirely on the concrete risks of anthropomorphism, treating AGI as a powerful, but inherently non-sentient tool.