Language Models’ Hall of Mirrors Problem: Why AI Alignment Requires Peircean Semiosis

Paper · Source

This paper examines some limitations of large language models (LLMs) through the framework of Peircean semiotics. We argue that basic LLMs exist within a "hall of mirrors," manipulating symbols without indexical grounding or participation in socially-mediated epistemology. We then argue that newer developments, including extended context windows, persistent memory, and mediated interactions with reality, are moving towards making newer Artificial Intelligence (AI) systems into genuine Peircean interpretants, and conclude that LLMs may be approaching this goal, and no fundamental barriers exist. This lens reframes a central challenge for AI alignment: without grounding in the semiotic process, a models’ linguistic encoding of goals may diverge from real-world values. By synthesizing Peirce's pragmatic view of signs, contemporary discussions of AI alignment, and recent work on relational realism, we illustrate a fundamental epistemological and practical challenge to AI safety and point to part of a solution.

The central question, then, is not whether LLMs can generate plausible text, but whether they can be situated within the kind of interpretive process that underwrites human understanding. We argue that while early LLMs exhibit fragments of Peircean Thirdness (generalization, abstraction), they lack the grounding of Secondness (direct encounter with reality1), and only weakly approximate the intersubjective processes that allow interpretation to become socially meaningful. At the same time, despite lacking true Secondness and only partially exhibiting Thirdness, recent developments—such as extended context windows, persistent memory, retrieval-augmented generation, and tool-use—suggests semiotic participation that goes beyond mere simulation, and which will likely continue to evolve.

On the first level, Peirce identifies three essential components of a sign: (1) the sign itself (also called the representamen), (2) the object it refers to, and (3) the interpretant, which is the effect the sign has on a mind or system that interprets it. This initial triadic structure is not a simple stimulus-response mechanism, instead embedding meaning in an ongoing, evolving process of interpretation.

The sign (representamen) is the perceptible or conceptual form that conveys meaning—it can be a word, an image, a gesture, or any symbol that functions semiotically. A written word, for example, serves as a sign, but its meaning is not inherent in its form; it derives from its relation to an object and its effect on an interpretant.

The object is that which the sign refers to. Peirce distinguishes between the immediate object (how the object is conceived within the semiotic process) and the dynamic object (the real entity or concept that exists independently of the sign). This distinction complicates computational interpretations of semiotics, as meaning is not merely a function of symbol manipulation but of relationship with an external reality.

The interpretant distinguishes Peirce’s model from simple representational accounts. The interpretant is the meaning or effect generated by the sign within a cognitive system - typically human. Importantly, this process is not static; each interpretant can itself become a new sign, creating an infinite regress of interpretation that contributes to knowledge formation. This recursive aspect is central to Peirce's concept of semiosis, where signs continuously generate new layers of meaning within a community of interpreters.

Semiosis is not just information-theoretic encoding and decoding of messages, but a dynamic and evolving process. Meaning does not reside in sign interpretation, but emerges through an iterative process that involves social, contextual, and historical dimensions. This is where the connection to AI becomes critical: LLMs process and generate linguistic signs, but it is unclear that they function as interpretants in the Peircean sense. Of course, Peirce himself notes that “thought is not necessarily connected with a brain,” (Pierce 1931) but this is certainly not enough to say that all syntactic manipulation is true semiosis, and as Burch (2021) writes, “the interpretant is something ineliminably mental,” though what qualifies as “a mental act, a mental state, or a feature or quality of mind” must be clarified.

In Markov-chain text generators, the system certainly lacks any indexical and iconic connections that ground their meaning in external reality. And this lack of direct referentiality is central to the "hall of mirrors" problem - though some AI architectures at least start to overcome this, as we will discuss.

For basic LLMs, all such processes are transitory, with each session being independent and incapable of lasting semiosis. They also lack the dialectic between Secondness and Thirdness that characterizes human learning: they do not encounter brute facts in a world that resists incorrect interpretations, nor do they participate in socially-mediated processes of meaning formation. Instead, the semiosis of token-generation is purely symbolic and recursive, within the Hall of Mirrors rather than ever reaching grounded interpretation. Outside of a single session, their interpretants (responses) cannot inform their future representations. And even within a session, they fail to capture important parts of linguistic meaning (Asher et al 2023) and hallucinate in incoherent ways (Asher and Bhar 2024).

if LLMs and other AI systems remain confined to discrete textual corpora without engaging in dynamic, feedback-rich interaction, they remain trapped in the hall of mirrors problem—pure Thirdness without the grounding of Secondness. However, if AI systems can be designed to interact persistently with both human interpretants and physical reality, relational realism suggests that they may begin approximating a form of semiotic participation beyond mere symbol manipulation.

That is, we are not just discussing functional systems we call LLMs, but an actual human produced artifact. And Legg’s critique of LLMs maintains that word embeddings and neural architectures capture associative patterns, a step forward, but lack the semiotic mechanisms necessary for robust inquiry and truth-tracking.

Language models’ implicit understanding of the world reflects distorted mirrors presented by an undifferentiated mishmash of biased and partial views of reality, fiction, and non-truth-apt text.

Peirce’s concept of indexicality explains that LLMs fail to establish a genuine connection between their linguistic outputs and the world they describe, even when they appear to do so. Indexical signs—such as demonstratives (“this,” “that”), pronouns, and physical indicators (e.g., a pointing gesture or smoke indicating fire)—gain meaning by being causally or contextually linked to specific entities or events. Unlike symbols, which depend on conventional associations, indexical signs require referential grounding.

LLMs learn some types of referentiality, and generate text that appears contextually appropriate, their words do not refer to external objects through any causal or experiential link. For example, if an LLM states, “It is raining in Paris,” the truth of this statement depends entirely on whether it corresponds to external meteorological conditions. Lacking perceptual access to Parisian weather, a simple LLM generates the statement based on statistical patterns in its training data. This is fundamentally different from a human saying the same sentence, especially after looking outside a window while sitting in Paris.

Moreover, indexical grounding is not just about pointing at things—it involves being situated within an ongoing interpretative process, where signs are linked dynamically to external reality through perception and action. Even in cases where an isolated LLM produces statements that are factually correct, it lacks the ability to adjust its interpretations in response to direct environmental feedback. Its words appear referential, and may be true, but lack a direct causal tie to the world.

Searle (1980) argues by reductio, which we will adapt to say that if an LLM is ultimately a high-dimensional mapping from token sequences to probabilities, then it can do no more than simulate interpretation.

His semiotics explicitly allows that interpretants may be instantiated in any system capable of sustaining the triadic relation between sign, object, and interpretive effect. As he put it, "thought is not necessarily connected with a brain" (Peirce 1931). Unless we embrace epiphenomenalism, the challenge cannot be whether AI systems are “just” finite-state machines

In Pierce’s view, however, what matters is what a system does, and how—specifically, whether it participates in a recursive, mediated process of interpretation involving indexical reference and social constraint. Finitude is not disqualifying; isolation is. Human minds are physically finite, but are embedded in a world that resists, and a society that interacts. This gives feedback from the world (Secondness) and shared, negotiated systems of meaning (Thirdness).

To evaluate LLMs as interpretants, we focus on the presence or absence of Secondness and socially-mediated Thirdness. The key question, then, is not “Can the model interpret?” but rather, “Is the model embedded in a world” and “does this satisfy semiotic criteria?” This reframing allows an empirical turn (Brey 2010). Rather than debating metaphysical claims or asserting impossibilities, we ask concrete questions: What kinds of feedback do LLMs receive? Can they act as stable interpretants across time and interaction? Do the architectures allow for the emergence of semiosis?

Murray Shanahan (2024) argues that LLMs create a powerful illusion of understanding, but they do not think, understand, or reason like humans. He suggests that “the bare-bones LLM itself,” consisting of only the generative model, is what “really” drives the not-actually-cognition of these models. It also “does not really know anything because all it does, at a fundamental level, is sequence prediction,” but lacks the “special relationship propositional sequences have to truth.”

However, Retrieval-Augmented Generation (RAG) incorporates an external retrieval system— typically a vector database of documents or embeddings—allowing the model to “look up” relevant information in response to a query (Lewis et al., 2020). This can be framed as bolting on a kind of dynamic indexicality: the system does not reason about facts in the world, but it can retrieve representations of them6, or check its outputs with the external data, updating its apparent knowledge without retraining.

Similarly, directly addressing secondness, tool-use and modular systems (e.g., ReAct, Toolformer, LangChain pipelines) allow models to interface with external APIs, calculators, or knowledge bases. These are hybrid systems, and from a semiotic perspective, one might say this architecture introduces a kind of delegated Secondness—the model can trigger interactions with the world, and incorporates the results into its next textual prediction.

Yes, this is built on a basis of its initial token prediction training, but the objection weakens greatly once a model has access to external reality in various forms. And building on this, in-context learning does allow for session-specific interaction, and during these sessions, LLMs can be said to construct at least a pseudo-understanding. Lastly, the model is no longer trained to predict the next token it expects based on text; it is trained to act as an interactive partner in lengthy discussions

Despite the lack of indexical grounding in LLMs, the implicit word-to-world-model built via gradient descent training on text leads to a strikingly coherent world model.

From a Peircean perspective, this limits both Secondness and Thirdness: the model has no persistent commitments, and therefore does not integrate new signs into an evolving interpretive structure across time. To the extent that “learning” is local and reversible, the context window is a playground or theatre for simulating semiosis, not cumulative and collaborative interpretation.

internal dialogue of reasoning models, iteratively reflective and reconsidering, certainly enables certain surface features of Thirdness—generalization, abstraction, inference—even if critics may claim this lacks the depth of mediated commitment that gives Thirdness its epistemic and normative traction

Perhaps the most promising development is AI systems’ tool-use, which allows them to act, query APIs, read files, and modify environments both directly, and indirectly via communication. Tool use introduces a minimal form of agency: the model can select among possible actions, receive feedback about outcomes, and adapt future choices accordingly. RLHF and similar human-in-theloop training techniques also provide an indirect form of Secondness—resistance from outside the model’s symbolic world, albeit filtered through human preferences rather than physical reality. These forms of mediated interaction create what we might call proto-indexicality: the model’s outputs can now have causal effects in the world, and the world (via user feedback, external API results, or tool outputs) can feed back into the model’s subsequent behavior.

Even contact with reality does not guarantee alignment, though it does prevent some classes of failure. Peirce’s notion of Thirdness—the mediated, generalizing relation that links sign, object, and interpretant—is also essential to alignment. Specifically, Thirdness underwrites two critical properties of aligned systems: corrigibility, the ability to update or defer to corrective input, and alignment, the capacity to discover and maintain consistency with the goals of external actors. In Peircean terms, a system revises its interpretation if (and, per Peirce, only if,) it participates in the recursive, mediated process of semiosis. Token prediction or symbolic manipulation is not enough; alignment requires interpretants embedded in feedback-rich interaction not just with reality, but with others.

Humans experience consequences of misunderstanding, and can come to recognize when semantic slippage or misgeneralization occurs. In contrast, without access to real-world feedback, LLMs cannot distinguish sign success from referential success.

Peirce’s fallibilist epistemology and his theory of the scientific community says that truth is the ideal end of inquiry. Here, the argument goes further; cooperative positive engagement is the ideal end of what has recently been called socioaffective alignment (Kirk et al 2025). An aligned AI system in this sense would be aligned with the epistemic and social community in which it is embedded. Of course, current systems do not fully succeed at engaging with reality. And lacking full Secondness and Thirdness, they are not Peircean interpretants, and cannot be aligned.

We then examined how modern models have begun to address these shortcomings yet still fall short of full Peircean semiosis. In the previous section, we further argued that any path to genuine AI alignment will require, among other things, systems that can function as interpretants in the Peircean sense. In this section, we argue that such systems are at least in principle, possible.