Language Understanding and Pragmatics Design & LLM Interaction Psychology and Social Cognition

Can AI distinguish which differences actually matter?

Explores whether AI systems can perform the qualitative judgment that experts use to select relevant observations. Matters because confusing AI outputs with expert observation leads users to trust pattern-matching as if it were reasoning about what's important.

Note · 2026-03-26
What do language models actually know? What grounds language understanding in systems without embodiment?

Gregory Bateson defined information as "a difference which makes a difference." This deceptively simple formulation captures something essential about expertise that AI cannot perform: the act of selecting which differences matter.

When an expert observes a situation — a patient's symptoms, a market trend, a structural flaw in an argument — they are performing an act of qualitative selection. From the vast space of possible observations, they choose the ones that matter. This selection is not pattern-matching. It is judgment: the expert perceives differences and decides which ones make a difference to the problem at hand. The observation that makes a difference is an action of communication — it reports to the system (the community, the audience, the field) a change that moves understanding forward.

AI systems operate in a fundamentally different register. Since Do foundation models learn world models or task-specific shortcuts?, LLMs develop statistical heuristics tuned to pattern frequency, not to relevance. They can find patterns, connections, concepts, probabilities, and thresholds. But the differences that make a difference to an LLM are mathematical — quantitative not qualitative. An LLM cannot decide that one pattern matters more than another in a way that reflects understanding of the domain. It can only decide that one pattern is more probable than another given its training distribution.

This is the observer problem. Knowledge is observation — it is information about, relevant for, reasonable because, relevant to. These are conceptual connections whereby knowledge functions as a map to a territory. The expert is an observer system: they observe the needs of an audience, the state of knowledge, and apply observation in the act of making recommendations. Crucially, the expert can engage in self-observation — deliberately shaping their expertise to ensure it is suitable and relevant.

AI is not an observer. It generates responses from prompts. It doesn't have observations of a state — of knowledge, information, the user, an audience, or other contextual information. Since Should we call LLM errors hallucinations or fabrications?, this absence of observation is precisely what makes AI output fabrication: it produces text that has the form of observation without the epistemic process of observing.

The practical consequences are significant. Many users, including experts, do not have a mental model appropriate for LLMs. When experts make observations, they are being subjective in the productive sense — applying reason and judgment to information in order to choose what is important and relevant. Since Why do people trust AI outputs they shouldn't?, users interpret AI outputs through the same cognitive frameworks they use for human expert observations. But the outputs were produced by a different process entirely — one that mimics the form of observation without performing the selection that gives observation its value.

This connects to a deeper theoretical point about what LLMs can and cannot do with internal evaluation. Since Can LLMs generate more novel ideas than human experts?, the generative capacity of LLMs is not matched by evaluative capacity. They can produce more options than any human expert — but they cannot determine which options matter. The "differences that make a difference" are invisible to a system that operates on statistical association rather than qualitative judgment.

Even when LLMs apply internal judges, rubrics, or meta-reflections, these are simulations of selection — they have no means to qualify the relevance of their generations against the actual state of the domain, the needs of the audience, or the significance of the moment. The rubric can score surface features. It cannot judge importance.


Source: inbox/Knowledge Custodians.md

Related concepts in this collection

Concept map
18 direct connections · 181 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

AI cannot distinguish differences that make a difference — observation requires qualitative selection of relevance that quantitative pattern-matching cannot replicate