Can AI distinguish which differences actually matter?
Explores whether AI systems can perform the qualitative judgment that experts use to select relevant observations. Matters because confusing AI outputs with expert observation leads users to trust pattern-matching as if it were reasoning about what's important.
Gregory Bateson defined information as "a difference which makes a difference." This deceptively simple formulation captures something essential about expertise that AI cannot perform: the act of selecting which differences matter.
When an expert observes a situation — a patient's symptoms, a market trend, a structural flaw in an argument — they are performing an act of qualitative selection. From the vast space of possible observations, they choose the ones that matter. This selection is not pattern-matching. It is judgment: the expert perceives differences and decides which ones make a difference to the problem at hand. The observation that makes a difference is an action of communication — it reports to the system (the community, the audience, the field) a change that moves understanding forward.
AI systems operate in a fundamentally different register. Since Do foundation models learn world models or task-specific shortcuts?, LLMs develop statistical heuristics tuned to pattern frequency, not to relevance. They can find patterns, connections, concepts, probabilities, and thresholds. But the differences that make a difference to an LLM are mathematical — quantitative not qualitative. An LLM cannot decide that one pattern matters more than another in a way that reflects understanding of the domain. It can only decide that one pattern is more probable than another given its training distribution.
This is the observer problem. Knowledge is observation — it is information about, relevant for, reasonable because, relevant to. These are conceptual connections whereby knowledge functions as a map to a territory. The expert is an observer system: they observe the needs of an audience, the state of knowledge, and apply observation in the act of making recommendations. Crucially, the expert can engage in self-observation — deliberately shaping their expertise to ensure it is suitable and relevant.
AI is not an observer. It generates responses from prompts. It doesn't have observations of a state — of knowledge, information, the user, an audience, or other contextual information. Since Should we call LLM errors hallucinations or fabrications?, this absence of observation is precisely what makes AI output fabrication: it produces text that has the form of observation without the epistemic process of observing.
The practical consequences are significant. Many users, including experts, do not have a mental model appropriate for LLMs. When experts make observations, they are being subjective in the productive sense — applying reason and judgment to information in order to choose what is important and relevant. Since Why do people trust AI outputs they shouldn't?, users interpret AI outputs through the same cognitive frameworks they use for human expert observations. But the outputs were produced by a different process entirely — one that mimics the form of observation without performing the selection that gives observation its value.
This connects to a deeper theoretical point about what LLMs can and cannot do with internal evaluation. Since Can LLMs generate more novel ideas than human experts?, the generative capacity of LLMs is not matched by evaluative capacity. They can produce more options than any human expert — but they cannot determine which options matter. The "differences that make a difference" are invisible to a system that operates on statistical association rather than qualitative judgment.
Even when LLMs apply internal judges, rubrics, or meta-reflections, these are simulations of selection — they have no means to qualify the relevance of their generations against the actual state of the domain, the needs of the audience, or the significance of the moment. The rubric can score surface features. It cannot judge importance.
Source: inbox/Knowledge Custodians.md
Related concepts in this collection
-
Do foundation models learn world models or task-specific shortcuts?
When transformer models predict sequences accurately, are they building genuine world models that capture underlying physics and logic? Or are they exploiting narrow patterns that fail under distribution shift?
heuristics are quantitative pattern-matching, not qualitative selection of relevance
-
Should we call LLM errors hallucinations or fabrications?
Does the language we use to describe LLM failures shape the technical solutions we build? Examining whether perceptual and psychological frameworks misdiagnose what's actually happening.
fabrication as the consequence of generating without observing
-
Why do people trust AI outputs they shouldn't?
When do human cognitive shortcuts fail in AI interaction? Three compounding traps—treating statistical patterns as facts, mistaking fluency for understanding, and avoiding disagreement—may explain systematic overreliance across languages and contexts.
users apply observation frameworks to non-observational outputs
-
Can LLMs generate more novel ideas than human experts?
Research shows LLM-generated ideas score higher for novelty than expert-generated ones, yet LLMs avoid the evaluative reasoning that characterizes expert thinking. What explains this apparent contradiction?
generation without evaluative selection: the ideation version of the observation problem
-
Why does AI writing sound generic despite being grammatically correct?
Explores whether the robotic quality of AI text stems from grammatical failures or rhetorical ones. Understanding this distinction matters for diagnosing what AI systems actually struggle with in human-like writing.
mastering structure without evaluation is mastering form without observation
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
AI cannot distinguish differences that make a difference — observation requires qualitative selection of relevance that quantitative pattern-matching cannot replicate