Can statistical learning from text replace embodied cultural experience?

This explores whether a model that only reads text — never lived in a body or a culture — can match the kind of understanding that comes from embodied, lived experience, and the corpus splits sharply on where the ceiling sits.

This explores whether statistical learning from text can stand in for embodied cultural experience — and the collection is genuinely divided, which is the interesting part. The provocative evidence first: AI models don't just approximate human social judgment, they beat it. GPT-4.5 scored at the 100th percentile against human raters on the appropriateness of 555 social scenarios, with Claude and Gemini close behind Can AI systems learn social norms without embodied experience? Can AI learn social norms better than humans?. On its face this dents the assumption that you must *live* a culture to read it. The theoretical companion to that result is the idea that language models pull off a kind of meaning-from-structure-alone: by compressing the relational patterns of text — Saussure's *langue*, where a word's meaning is just its position relative to other words — they generate fluent, culturally-situated discourse with no external referents at all Can language models learn meaning without engaging the world?.

But notice the crack running through even the optimistic result: all the models share *identical systematic errors*, especially on unwritten norms Can AI systems learn social norms without embodied experience?. That's the tell. They're not failing randomly the way humans do — they're all missing the same things, the things that never got written down because embodied experience transmits them instead. The skeptical wing of the corpus names why. Bender and Koller's argument is that meaning lives in the relation between words and communicative intent, and a system trained purely on form-to-form prediction never sees intent or shared attention, so it can't reconstruct grounded meaning Can language models learn meaning from text patterns alone?. A complementary framing calls text-only models Plato's-cave prisoners: text strips out the physics, geometry, and causality of the world, leaving the model to shuffle symbols whose source dynamics it never touched — which predicts exactly where it breaks (physical, spatial, causal reasoning) Are text-only language models fundamentally limited by abstraction?.

There's a subtler failure the corpus surfaces that pure benchmark scores hide: text isn't a neutral mirror of all cultures. Mechanistic analysis shows low-resource cultures like Ethiopia and Algeria are *internally* represented through high-resource cultural proxies — the model routes them through dominant-culture pathways even when it can produce a correct surface answer Do LLMs represent low-resource cultures through dominant cultural proxies?. So statistical learning from text doesn't just lose embodiment; it inherits the lopsidedness of what got written down and by whom. And what makes culture *work* between people — the implicit repair, topic hand-off, and relational maintenance of live conversation — isn't information to be predicted at all; it's social action, which is precisely why training signals that reward next-token prediction don't produce it Why don't language models develop conversation maintenance skills?.

The sharpest reframe in the collection is that the question itself may be slightly wrong. One line of work argues AI doesn't produce utterances but *event-residue* — text carrying the communicative markers of its training data but missing the event structure of a real exchange, which the human reader then animates into a pseudo-conversation by supplying the orientation only they possess Does AI generate genuine utterances or just text patterns?. By that account text-learning never *replaces* embodied experience; it offloads the embodied half onto you. The unexpected takeaway: the corpus suggests statistical learning can reproduce the legible, written *surface* of culture astonishingly well — well enough to out-predict any single human — while systematically missing the unwritten, relational, and physically-grounded layer that embodiment carries, and the danger is that the fluent surface tempts us to stop noticing the missing layer How do we learn to read AI-generated text critically?.

Sources 9 notes

Can AI systems learn social norms without embodied experience?

GPT-4.5 predicted appropriateness of 555 social scenarios at the 100th percentile compared to human raters, with Gemini and Claude also exceeding 96% accuracy. However, all models show identical systematic errors, revealing boundaries of pattern-based social understanding that embodied experience may still be necessary to cross.

Can AI learn social norms better than humans?

GPT-4.5 outperformed every individual human at judging social appropriateness across 555 scenarios, challenging the theory that embodied cultural experience is necessary. However, all AI models share identical systematic errors on unwritten norms.

Can language models learn meaning without engaging the world?

Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.

Can language models learn meaning from text patterns alone?

Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.

Are text-only language models fundamentally limited by abstraction?

Text strips the physics, geometry, and causality present in reality, forcing language models to manipulate symbols without grounding in their source dynamics. This creates predictable failure modes in physical, geometric, and causal reasoning that multimodal training could address.

Do LLMs represent low-resource cultures through dominant cultural proxies?

Mechanistic interpretability analysis reveals that low-resource cultures like Ethiopia and Algeria are structurally represented through high-resource cultural proxies in internal model states, not just output. This architectural bias persists even when models can produce correct surface-level answers.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Does AI generate genuine utterances or just text patterns?

AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.

How do we learn to read AI-generated text critically?

Every established discourse source carries an interpretive posture that filters how publics receive it. AI-generated text arrived too recently and shifts too quickly to anchor such a posture, allowing it to spread without the protective skepticism we automatically apply to interested speech.

Can statistical learning from text replace embodied cultural experience?

Sources 9 notes

Next inquiring lines