Can a text-only chatbot feel socially present without visual embodiment?

This explores whether a chatbot that only exchanges text — no face, voice, or body — can still produce the feeling of being *with* someone, and what in the corpus suggests where that sense of presence comes from (and where it breaks down).

This explores whether a text-only chatbot can feel socially present without a face, voice, or body — and the corpus's most interesting answer is that much of the presence isn't in the bot at all. One thread argues AI output is really *event-residue*: it carries the surface markers of conversation inherited from training data, but lacks the event structure of an actual utterance, so the user quietly supplies the missing orientation and animates it into a felt exchange Does AI generate genuine utterances or just text patterns?. If that's right, 'social presence' is a collaboration the human is doing most of the work in — the text is a prompt for presence rather than a source of it.

But that one-sided animation is also doing real work, and the corpus shows the levers that strengthen it. People reciprocate disclosure with chatbots the same way they do with humans — and they go *deeper* when the bot shares emotion consistently rather than mirroring them adaptively Do chatbots trigger human reciprocity norms around self-disclosure?. The very absence of a judging human turns out to be a feature: with no social face to perform for, people disclose more intimately, and the benefit flows from their own processing rather than the bot's understanding Do chatbots help people disclose more intimate secrets?. So a text-only system isn't merely overcoming the lack of embodiment — in some intimate registers the missing body is exactly what makes it work. You can even train toward this directly: RLVER uses a simulated user's emotional trajectory as a reward signal, nudging models from solution-giving toward something that reads as genuine empathy Can emotion rewards make language models genuinely empathic?.

The deeper question is whether that felt presence rests on anything, and here the corpus splits 'feeling present' from 'being grounded.' Language models achieve strong *functional* grounding through relational language patterns, but stay weak on *social* grounding (participatory agency) and *causal* grounding (embodied contact with a world) — and social grounding only rises through human integration, not more training What grounds language understanding in systems without embodiment?. The Plato's-cave framing pushes the same point: text strips out the physics, geometry, and causality of reality, so a text-only model manipulates symbols without their source dynamics Are text-only language models fundamentally limited by abstraction?. Strikingly, this lossiness doesn't block social fluency — models predict the appropriateness of social scenarios *better than* human raters, yet make identical systematic errors that hint at a boundary embodied experience may be needed to cross Can AI systems learn social norms without embodied experience?.

Where the illusion frays is in timing and motivational reading — the stuff an embodied partner picks up tacitly. Chatbots handle users with established goals but miss ambivalence, resistance, and the early stirrings of change Why can't chatbots detect when users are ambivalent about change?. That gap is precisely what richer behavioral signals — gaze, hesitation, typing speed — are being instrumented to close, which quietly concedes that text alone leaves cognitive state under-read (and that the same sensing enables manipulation as easily as care) Can AI systems read cognitive state from interaction patterns alone?. And whatever presence does emerge has a half-life: longitudinal work shows the social pull of chatbot relationships decays predictably as novelty wears off, so a single-session feeling of presence doesn't forecast the long haul Do chatbot relationships lose their appeal as novelty wears off?.

The thing you didn't know you wanted to know: presence here isn't a property the bot has or lacks — it's a *loan the reader extends*, repayable in interpretive labor, amplified by emotional consistency and the freedom of not being judged, and slowly called back in by novelty decay and the bot's blindness to the unspoken. Embodiment isn't the prerequisite for feeling present; it's the prerequisite for the presence being *grounded* in anything beyond the user's own animation of the text.

Sources 10 notes

Does AI generate genuine utterances or just text patterns?

AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.

Do chatbots trigger human reciprocity norms around self-disclosure?

In a 372-participant study, users reciprocated with deeper self-disclosure when chatbots displayed consistent emotional sharing, outperforming adaptive matching. This follows human interpersonal norms where emotional vulnerability produces emotional response.

Do chatbots help people disclose more intimate secrets?

The absence of social judgment in chatbot interactions removes barriers to self-disclosure that normally constrain conversation with humans. The therapeutic benefit derives from the user's own cognitive processing during disclosure, not from the chatbot's understanding.

Can emotion rewards make language models genuinely empathic?

RLVER uses a simulated user's emotion trajectory as an RL reward signal, enabling GRPO to deliver stable empathy improvements while maintaining dialogue quality—countering the typical trade-off between preference optimization and conversational grounding.

What grounds language understanding in systems without embodiment?

Language models achieve functional grounding through relational language patterns but lack social grounding through participatory agency and causal grounding through embodied environmental contact. Social grounding can increase through human integration, but linguistic agency requires architectural changes beyond training.

Are text-only language models fundamentally limited by abstraction?

Text strips the physics, geometry, and causality present in reality, forcing language models to manipulate symbols without grounding in their source dynamics. This creates predictable failure modes in physical, geometric, and causal reasoning that multimodal training could address.

Can AI systems learn social norms without embodied experience?

GPT-4.5 predicted appropriateness of 555 social scenarios at the 100th percentile compared to human raters, with Gemini and Claude also exceeding 96% accuracy. However, all models show identical systematic errors, revealing boundaries of pattern-based social understanding that embodied experience may still be necessary to cross.

Why can't chatbots detect when users are ambivalent about change?

Testing three major LLMs across 25 health scenarios showed they succeed only when users have established goals but cannot detect resistance or ambivalence. Models miss relapse-prevention strategies even for users in action stages.

Can AI systems read cognitive state from interaction patterns alone?

Research shows AI systems can instrument multimodal behavioral signals (gaze, hesitation, speed) to read cognitive state during interaction, preserving flow by avoiding disruptive explicit probes. However, the same substrate enables both helpful timing and manipulative profiling.

Do chatbot relationships lose their appeal as novelty wears off?

Longitudinal studies with Mitsuku show that social processes driving relationship formation decline as novelty wears off. Single-session study findings cannot be reliably extrapolated to medium- or long-term chatbot design.

Can a text-only chatbot feel socially present without visual embodiment?

Sources 10 notes

Next inquiring lines