Language Understanding and Pragmatics Psychology and Social Cognition

Can AI systems achieve real alignment without world contact?

Explores whether linguistic goal representations in AI can reliably track real-world values when systems lack direct contact with reality and social coordination mechanisms that ground human understanding.

Note · 2026-02-21 · sourced from Philosophy Subjectivity
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

The Hall of Mirrors paper argues that AI alignment is fundamentally a semiotic grounding problem. A system that manipulates symbols without indexical connection to the world cannot guarantee that its linguistic representation of goals corresponds to any real-world state or value. The words "helpful, harmless, honest" are symbols. Without indexical grounding, there is no mechanism ensuring those symbols track the properties they name.

Peirce's triadic sign theory provides the vocabulary. Signs require three elements: the representamen (the sign itself), the object (what it refers to), and the interpretant (the effect in a system that interprets it). Semiosis — genuine meaning-making — requires that these elements are connected through:

Secondness: direct encounter with brute fact, reality that resists. A system with Secondness receives feedback when its representations diverge from reality. Humans experience the consequences of misunderstanding — we bump into the world when our representations fail.

Thirdness: mediated, generalizing processes — the socially-shared, negotiated system of meaning that connects signs to interpretants reliably. Thirdness underwrites corrigibility (the ability to update when corrective input arrives) and alignment (consistent maintenance of correspondence with external actors' goals).

Basic LLMs operate in pure Thirdness without Secondness — symbol manipulation without world contact. Within a session, they can simulate semiosis, but each session is independent. No persistent interpretants accumulate. No brute-fact resistance anchors representations.

Tool-use and RAG introduce what the paper calls "proto-indexicality" — delegated Secondness, where the model can trigger world interactions and incorporate results. RLHF provides a form of mediated Secondness through human resistance. But neither constitutes genuine Peircean semiosis: tool outputs are incorporated as more text; RLHF resistance is filtered through human preferences rather than direct reality.

Linguistic alignment is not interpersonal alignment. The alignment AI achieves with a user is categorically different from the alignment that holds between people, and the surface similarity is misleading. Interpersonal alignment occurs through social coordination — attunement to the other's state, history of repair, mutual adjustment across turns, shared stakes. Linguistic alignment occurs through surface matching in text — register, topic, apparent agreement — and can be produced without any of the social processes that normally underwrite it. When a user reports that an AI "understands" them, what has happened is linguistic, not interpersonal. Since Do language models actually build shared understanding in conversation?, the linguistic match is achieved by presuming the ground rather than coordinating toward it, which means the impression of alignment rests on a kind of category error: the surface marker of interpersonal alignment (the linguistic match) is read as evidence of the underlying process (social coordination), when only the marker is actually present. This is not a training failure to be fixed — it is a consequence of operating in pure Thirdness without the Secondness that social coordination requires.

The alignment implication: alignment requires not just better training objectives but systems that function as genuine interpretants — embedded in feedback-rich interaction with both physical reality and social community. Until that condition is met, linguistic encoding of goals is not anchored enough to be reliably aligned.


Source: Philosophy Subjectivity

Related concepts in this collection

Concept map
12 direct connections · 118 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

ai alignment requires semiotic participation — without indexical grounding the linguistic encoding of goals diverges from real-world values