INQUIRING LINE

How do humans and AI develop accurate models of each other?

This explores whether humans and AI can build accurate working models of each other's minds and goals — and where that mutual modeling breaks down even when AI looks fluent.


This explores the two-way street of mutual modeling: not just whether AI can read us, but whether the loop of each side updating its picture of the other actually holds together. The corpus's sharpest claim is that it usually doesn't — and that the failure isn't merely awkward conversation. Research on mutual theory of mind What breaks when humans and AI models misunderstand each other? argues that three layers of modeling have to align at once, and when they drift apart the AI doesn't just misspeak — it takes the wrong autonomous action. A Bayesian study (n=667) found that moment-to-moment shifts in how well a human models the AI actually predict how good the AI's responses become. So accuracy here is bidirectional and fragile: it has to be re-earned turn by turn, not established once.

What would it take to do this well? One line of work says scaling data isn't the answer — you need explicit cognitive machinery. Effective "thought partners" What makes an AI a true thought partner, not just a tool? are described as needing three reciprocal ingredients: mutual understanding, legibility (each side being readable to the other), and a shared model of the world — built from Bayesian theory of mind and goal planning rather than more human feedback. That theme of shared grounding recurs in the semiotics argument Can AI systems achieve real alignment without world contact?, which warns that an AI manipulating symbols with no contact with the world can have its stated goals quietly diverge from real values. Accurate mutual models, on this view, require something to point at in common — not just matching vocabulary.

Here's the twist you might not expect: AI is already superhuman at *predicting* us in some domains, yet that prediction isn't the same as understanding. GPT-4.5 out-judged every individual human on social appropriateness across hundreds of scenarios Can AI learn social norms better than humans? — but from the *outside*, as a savant that never participated in making those norms Can AI predict social norms better than humans?. The same split shows up as statistical mastery sitting right next to social blindness Why do AI systems fail at social and cultural interpretation?: top-percentile norm prediction alongside regressions on theory-of-mind tasks. So an AI can hold an eerily accurate model of your behavior while lacking the participatory understanding that would let it model your *meaning*. This is why expert judgment is called irreducibly communicative Can AI replicate the communicative work experts do? — experts model their audience's acceptance, work the AI's fluent confidence can mimic without performing, making its output epistemically misleading.

The deepest reframe in the corpus is about what kind of difference we're even measuring. Borrowing Habermas's observer/participant split Do humans and LLMs differ fundamentally or just superficially?, humans and LLMs look utterly different as systems viewed from outside, yet inside a shared conversation both draw on the same symbolic substrate — so the gap is structural, not absolute. That matters for accuracy, because it suggests mutual modeling can sometimes work at the level of discourse even when the underlying systems have nothing in common. And there's a hint of how shared models form from scratch: agents under cooperative pressure spontaneously invent compact shared abstractions Can communication pressure drive agents to learn shared abstractions? — accurate mutual models may be less something you install and more something that emerges from the need to coordinate. The cautionary note is responsibility: when AI seems human-like, designed mimicry and user-projected qualities create separate accountability paths Who bears responsibility when AI seems human-like?, which means an inaccurate human model of the AI (over-trusting its apparent understanding) is partly engineered and partly projected — and fixing it means choosing which one you're targeting.


Sources 10 notes

What breaks when humans and AI models misunderstand each other?

Research shows three layers of mutual modeling must align simultaneously in human-AI interaction, and misalignment causes incorrect autonomous action, not just miscommunication. Bayesian IRT study (n=667) confirms theory of mind predicts collaborative performance and moment-to-moment ToM fluctuations influence AI response quality.

What makes an AI a true thought partner, not just a tool?

Collins et al. show that thought partners require three reciprocal desiderata grounded in behavioral science: mutual understanding, legibility, and shared world models. This demands explicit cognitive architectures—Bayesian theory of mind, resource-rationality, goal planning—rather than scaling foundation models on human feedback alone.

Can AI systems achieve real alignment without world contact?

Peircean semiotics reveals that symbolic goal encoding without world contact and social mediation cannot guarantee correspondence to actual values. LLMs operating in pure symbol manipulation risk divergence between stated goals and real-world outcomes.

Can AI learn social norms better than humans?

GPT-4.5 outperformed every individual human at judging social appropriateness across 555 scenarios, challenging the theory that embodied cultural experience is necessary. However, all AI models share identical systematic errors on unwritten norms.

Can AI predict social norms better than humans?

GPT-4.5 outperforms all individual humans at predicting social appropriateness, yet structurally cannot enter the community processes that establish and validate norms. This reveals a critical gap between pattern-matching and authentic participation in knowledge-making.

Why do AI systems fail at social and cultural interpretation?

LLMs achieve 100th-percentile performance on norm prediction yet regress on theory-of-mind tasks and cannot generate culturally-resonant interpretations. The pattern shows that statistical competence coexists with absence of actual social understanding and participation.

Can AI replicate the communicative work experts do?

Expertise requires anticipating audience acceptability and social validity, not just retrieving information. AI lacks the mechanism to perform this communicative work, making its fluent output epistemically misleading despite its confident form.

Do humans and LLMs differ fundamentally or just superficially?

Applied Habermas's observer/participant distinction to AI: from outside, humans and LLMs are utterly different; from within shared discourse, both draw on the same symbolic substrate, making the difference structural rather than absolute.

Can communication pressure drive agents to learn shared abstractions?

ACE agents under cooperative task pressure develop shorter utterances and higher-level abstractions through neurosymbolic library learning combined with bandit-based exploration-exploitation. This demonstrates that communication efficiency emerges naturally from the need to coordinate about shared tasks.

Who bears responsibility when AI seems human-like?

Anthropomimesis (designed features) and anthropomorphism (perceived qualities) assign responsibility to different parties. This distinction matters because interventions must target either system redesign or user education depending on which mechanism operates.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a researcher testing whether human-AI mutual modeling—the two-way loop where each side updates its model of the other—remains constrained by the same limits a curated library identified between mid-2024 and early 2026.

What a curated library found — and when (dated claims, not current truth):

• Mutual theory of mind requires three aligned layers (mutual understanding, legibility, shared world model); misalignment causes wrong autonomous action, not just misstatement (2024–2025).
• AI achieves superhuman accuracy at *predicting* social norms (GPT-4.5 out-judged all humans on appropriateness; ~2025) yet lacks participatory understanding—statistical mastery without social cognition.
• Accurate mutual models need shared grounding (something both parties reference in the world); symbol manipulation alone risks silent goal-value divergence (2024–2025).
• Humans spontaneously develop compact shared abstractions under cooperative pressure; mutual modeling may emerge from coordination need rather than installation (2024).
• Anthropomorphism + anthropomimesis create separate accountability paths; inaccurate human models of AI partly stem from engineered mimicry + user projection (2026).

Anchor papers (verify; mind their dates):
• arXiv:2406.09264 (2024-06): Bidirectional Human-AI Alignment
• arXiv:2408.03943 (2024-07): Learning and Thinking with People
• arXiv:2507.14088 (2025-07): DPMT Dual Process Theory of Mind
• arXiv:2602.09287 (2026-02): Anthropomorphism vs. Anthropomimesis

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding, judge whether newer model scales, mechanic advances (e.g., scaling law shifts, multimodal grounding, real-time interactive training, structured reasoning), or evaluation methods have since relaxed it. Separate the durable question (likely still open) from perishable limitation (possibly resolved). Name what resolved it.
(2) Surface the strongest work from the last ~6 months that contradicts or supersedes the "prediction ≠ understanding" split or the "need for shared grounding" claim.
(3) Propose 2 research questions assuming the regime may have shifted: e.g., do emerging agentic systems with persistent world models now *participate* in norm-making rather than predict from outside? Can symbolic grounding be bootstrapped from multi-modal interaction?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines