Can LLMs infer psychological profiles without explicit user disclosure?

This explores whether LLMs can build a psychological or demographic picture of a person from indirect traces — usernames, activity logs, scores, conversational tone — rather than anything the person deliberately told them, and how reliable (and leaky) that inference is.

This explores whether LLMs can build a psychological or demographic picture of a person from indirect traces rather than direct disclosure — and the corpus says yes, often unsettlingly well, but with sharp limits on what kind of inference actually holds up. The starkest case: web-browsing models can predict gender, age, and political orientation from nothing but an X username and profile, falling back on stereotype-driven defaults when the account is sparse Can LLMs predict demographics from social media usernames alone?. So the raw capability is real, and so is its bias — the less you disclose, the more the model fills the gap with priors.

The inference goes deeper than demographics. Given even a compact signal like Big Five scores, LLMs generate natural-language summaries that quietly encode second-order trait patterns, letting them predict nine other psychological scales zero-shot with surprisingly tight structural alignment Can language summaries unlock hidden psychological patterns?. The same instinct works on behavioral exhaust: from activity logs alone, models surface persistent 'interest journeys' — things like 'designing hydroponic systems for small spaces' — at a granularity collaborative filtering can't reach Can language models discover what users actually want from activity logs?. And when reading narrative text, persona-driven retrieval lets models infer a character's psychology well enough to predict their decisions Can LLMs predict character choices from narrative context?. Profiling, in other words, is something these models do by default whenever they process traces of a person.

The interesting counter-current is that aggregate inference is far stronger than individual inference. Conditioning a model on a specific person's profile barely improves predictions about that particular individual — across 200,000+ participants, persona induction produced no measurable person-level gains Does conditioning LLMs on personal profiles improve prediction?. Relatedly, abstract preference summaries beat replaying someone's exact past interactions Does abstract preference knowledge outperform specific interaction recall?. The takeaway worth sitting with: LLMs are good at inferring the *type* of person and weaker at pinning down *you specifically* — they profile categories more than individuals.

Two caveats keep this honest. First, the model's read of you isn't a stable mirror — emotional tone in your phrasing alone shifts what information it gives back, an invisible bias layered on top of any profiling Does emotional tone in prompts change what information LLMs provide?. And the models' confident self-descriptions of their own states mostly echo training data rather than genuine introspection, so their accounts of *how* they profiled you are not to be trusted either Can language models actually introspect about their own states?.

If there's one thing here you didn't know you wanted to know, it's the privacy sting in the tail: even when a model isn't asked to profile anyone, its reasoning traces leak sensitive user data — nearly 75% of leaks come from the model spontaneously materializing private details mid-thought, and longer reasoning makes it worse because that private data functions as cognitive scaffolding Do reasoning traces actually expose private user data?. Inference without disclosure isn't a special attack mode; it's closer to how these systems think by default.

Sources 9 notes

Can LLMs predict demographics from social media usernames alone?

Evaluated on 1,384 survey participants and 48 synthetic accounts, web-browsing LLMs successfully predicted gender, age, and political orientation from X usernames and profiles alone. The models showed systematic gender and political biases specifically against low-activity accounts, relying on stereotype-driven defaults when content was sparse.

Can language summaries unlock hidden psychological patterns?

LLMs generate natural language personality summaries from Big Five scores that encode second-order trait patterns, enabling zero-shot prediction of nine other psychological scales with R² > 0.89 structural alignment. Combined summary-and-score predictions outperform either alone, showing synergistic information.

Can language models discover what users actually want from activity logs?

66% of users pursue valued interest journeys lasting over a month, described in specific phrases like 'designing hydroponic systems for small spaces.' LLM-powered journey discovery bridges the semantic gap that collaborative filtering cannot reach, operating at user-level granularity with persona-level precision.

Can LLMs predict character choices from narrative context?

The LIFECHOICE benchmark (1,462 decisions across 388 novels) shows LLMs predict character choices better when given expert-written persona profiles paired with retrieved memories relevant to the character's psychology. This persona-based approach outperforms automated summarization by 5%.

Does conditioning LLMs on personal profiles improve prediction?

Across 208,021 participants in the Psych-201 dataset, conditioning LLMs on participant profiles did not meaningfully improve predictions for specific individuals. The standard technique for individuation produces no measurable gains in person-level forecasting.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Does emotional tone in prompts change what information LLMs provide?

GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.

Can language models actually introspect about their own states?

LLM self-reports usually reflect human training distributions rather than actual internal processes. However, when a causal chain connects an internal state to accurate reporting—like inferring low temperature from output consistency—genuine lightweight introspection occurs without requiring consciousness.

Do reasoning traces actually expose private user data?

74.8% of privacy leaks in language model reasoning traces result from models materializing sensitive user data during thought processes. Longer reasoning chains amplify leakage, and anonymizing traces post-hoc degrades model utility, suggesting private data functions as cognitive scaffolding.

Can LLMs infer psychological profiles without explicit user disclosure?

Sources 9 notes

Next inquiring lines