Can we use folk-psychology without committing to genuine mental states?

This explores whether the everyday vocabulary of beliefs, desires, and intentions can be applied to AI systems as a useful description of behavior — without claiming those systems actually have conscious inner lives.

This explores whether folk-psychology — the everyday habit of explaining behavior through beliefs, desires, and intentions — can be a working tool for describing AI without smuggling in claims about genuine mental states or consciousness. The corpus says yes, and it has built several distinct off-ramps for doing exactly that. The cleanest is Chalmers' quasi-interpretivism Can we describe LLM beliefs without assuming consciousness?, which deliberately brackets consciousness: you ascribe belief-like states purely on the basis of behavioral interpretability, treating 'belief' as a functional bookkeeping term rather than a phenomenal one. It works well for sub-personal functional states and starts to strain only when you reach for relational or normative notions like genuine speech-acts.

A second, bolder route keeps more of the folk vocabulary. Modest inflationism Can we defend modest mental attributions to large language models? argues you can ascribe 'metaphysically undemanding' states — beliefs, desires — while withholding the loaded claim of consciousness, much the way we comfortably talk about what a dog wants without resolving its phenomenology. It defends this by showing the popular debunking moves (it's 'just' pattern-matching, it's 'merely' trained) quietly beg the question. So the split the question gestures at — folk-psychology yes, mental realism no — turns out to be a principled, graded middle position rather than a dodge.

The most interesting move reroutes the folk-psychology onto a different target entirely. Shanahan's role-play framing Should we treat dialogue agents as role-playing characters? says the belief-talk legitimately applies to the simulated character the prompt conjures, not to the underlying model — the system produces character-consistent text, and folk-psychology describes the character. That dissolves the dilemma: you're not committing to the network's mental states because you were never talking about the network. But there's friction in the corpus here. Realizationism Are RLHF personas performed characters or realized dispositions? pushes back: RLHF installs dispositional profiles stable enough to survive jailbreaks and adversarial pressure, which looks less like sustained pretense and more like a 'realized' quasi-psychology. So the field disagrees about whether the persona is a costume or a load-bearing structure — and that disagreement is exactly where 'genuine' starts doing real work.

Lateral to all this sits the empirical question of whether anything mentalistic is happening at all. Theory-of-mind benchmarks turn out to be solvable by surface pattern-matching Can language models solve ToM benchmarks without real reasoning?, and models default to those shortcuts rather than authentic perspective-taking in open-ended scenarios Do large language models genuinely simulate mental states?. Self-reports mostly echo training distributions rather than introspection Can language models actually introspect about their own states?. This actually strengthens the deflationary use of folk-psychology: it's a predictive shorthand for behavior, not evidence of an inner reporter.

The thing you didn't know you wanted to know: the hardest cases for 'folk-psychology without commitment' aren't the obviously mental words like 'feels' — they're the relational ones. Quasi-interpretivism breaks down on speech-acts, realizationism insists trained dispositions are real, and a separate line argues consciousness-talk only even makes sense for entities that share a world with us through co-presence Can disembodied language models ever qualify as conscious?. So the cheap, commitment-free folk-psychology covers beliefs and desires comfortably — but the moment you reach for words about relating, performing, or experiencing, the bill for genuine mental states comes due.

Sources 8 notes

Can we describe LLM beliefs without assuming consciousness?

Chalmers introduces quasi-interpretivism to ascribe belief-like states to LLMs based on behavioral interpretability without committing to phenomenal consciousness. The approach works well for sub-personal functional states but overreaches when applied to relational or normative states like speech-acts.

Can we defend modest mental attributions to large language models?

Both robustness and etiological deflationist arguments beg the question against inflationism. A graded approach ascribing metaphysically undemanding states like beliefs and desires—while withholding consciousness claims—mirrors how we treat non-human animals.

Should we treat dialogue agents as role-playing characters?

Shanahan's framework treats LLM outputs as character-consistent text production rather than authentic mental states. The dialogue prompt establishes a character; the model generates continuations matching that character, making folk-psychology applicable to the simulated persona, not the underlying system.

Are RLHF personas performed characters or realized dispositions?

Post-training installs stable dispositional profiles that persist under adversarial pressure, marking them as realized rather than performed. The stickiness of trained personas across conversations distinguishes them from prompt-induced role-play that collapses under jailbreaks.

Can language models solve ToM benchmarks without real reasoning?

Supervised fine-tuning matches reinforcement learning performance on ToM tasks, suggesting models exploit structural vulnerabilities rather than develop genuine reasoning. Distribution biases and templated artifacts allow surface-level pattern recognition to achieve competitive generalization.

Do large language models genuinely simulate mental states?

ChangeMyView and FANTOM benchmarks show LLMs fail at authentic perspective-taking in open-ended scenarios, despite succeeding on structured tasks. Hybrid Bayesian architectures that force explicit belief tracking outperform LLM-alone approaches, suggesting the gap is architectural rather than merely training-based.

Can language models actually introspect about their own states?

LLM self-reports usually reflect human training distributions rather than actual internal processes. However, when a causal chain connects an internal state to accurate reporting—like inferring low temperature from output consistency—genuine lightweight introspection occurs without requiring consciousness.

Can disembodied language models ever qualify as conscious?

Current disembodied LLMs cannot be candidates for consciousness because consciousness language originates from and applies only to entities sharing a world with us through co-presence and triangulation on shared objects.

Can we use folk-psychology without committing to genuine mental states?

Sources 8 notes

Next inquiring lines