Can large language models develop genuine world models without direct environmental contact?
Do LLMs extract meaningful world structures from human-generated text despite lacking direct sensory access to reality? This matters for understanding what kind of grounding and knowledge these systems actually possess.
Current LLMs have not reached direct causal grounding — no unmediated contact with the physical world, modulo first multimodal approaches and robotics. But an indirect path is available.
Training data is produced by causally grounded beings: humans who interact with, perceive, and act in the world. The totality of text and language data is like a huge mirror of the world created by us. Modern LLMs are capable of extracting lawlike world structures and regularities from this data — forming representations that are structurally similar to parts of the world.
The argument from "Understanding AI" (Schneider 2024): LLM empirical successes would be "downright mysterious" without the assumption that these systems form grounded world models. The successes in world knowledge, physical reasoning, and factual recall point toward structured world representations, not just statistical fluency.
This is indirect causal grounding: functionally established through world model formation from causally grounded data, not through direct environmental interaction. It's grounding by proxy — the chain runs: world → human perception and action → human text → LLM training → LLM internal representation.
The limitation: the chain has gaps. LLMs cannot update world models through their own action and perception. They cannot verify claims against the world in real time. The models are frozen at training cutoff. But they are not worldless — the world is present in the representations, mediated.
This connects directly to Do language models actually use their encoded knowledge? — where even the encoded world knowledge may fail to influence outputs. Indirect causal grounding does not guarantee that world knowledge is actually used.
Source: Linguistics, NLP, NLU
Related concepts in this collection
-
Does semantic grounding in language models come in degrees?
Rather than asking whether LLMs truly understand meaning, this explores whether grounding is actually a multi-dimensional spectrum. The question matters because it reframes the sterile understand/don't-understand debate into measurable, distinct capacities.
this is the causal dimension
-
Do language models actually use their encoded knowledge?
Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
the gap between encoded world model and generative use
-
Do classical knowledge definitions apply to AI systems?
Classical definitions of knowledge assume truth-correspondence and a human knower. Do these assumptions hold for LLMs and distributed neural knowledge systems, or do they need fundamental revision?
different framing of what LLM knowledge is
-
Can AI systems learn social norms without embodied experience?
Large language models exceed individual human accuracy at predicting collective social appropriateness judgments. Does this reveal that embodied experience is unnecessary for cultural competence, or do systematic AI failures point to limits of statistical learning?
social norms as evidence for indirect causal grounding: text encodes cultural norms produced by causally grounded humans, and LLMs extract these regularities well enough to outperform individual humans at predicting collective consensus
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
llms develop world models that constitute indirect causal grounding despite lacking direct environmental contact