Recommender Systems

Can LLMs predict demographics from social media usernames alone?

This explores whether web-browsing language models can infer personal attributes like gender, age, and political orientation from just a username and public profile. The finding matters because it reveals a privacy vulnerability that traditional API-based assumptions didn't anticipate.

Note · 2026-05-03 · sourced from Browsers

Recent LLMs equipped with web-browsing tools can access social media profiles directly via the open web rather than through rate-limited or paid APIs, and this capability changes what information is practically available about a user. Evaluated on a synthetic dataset of 48 X (Twitter) accounts and a survey dataset of 1,384 international participants, web-browsing LLMs can predict demographic attributes — gender, age, political orientation — from usernames alone with reasonable accuracy. The privacy model that assumed bulk inference required API access and Terms of Service compliance no longer holds, because a single browsing-enabled LLM can perform the same inference per-user on demand.

The bias finding compounds the privacy concern. Analysis of the synthetic dataset reveals that the models introduce gender and political biases specifically against accounts with minimal activity. When the model has rich content to read it makes calibrated inferences; when content is sparse it falls back on stereotype-driven defaults associated with name patterns and limited cues. This means low-activity users — disproportionately women, marginalized groups, and the privacy-conscious — receive systematically more biased predictions than high-activity users, inverting the expectation that less data would yield more uncertain rather than more biased predictions. This sparse-persona failure mode is structurally similar to the one named in Why do LLM judges fail at predicting sparse user preferences?.

The dual-use framing matters. The capability is genuinely useful for computational social science in a "post-API era" where research datasets are harder to construct legally. But the same capability lowers the cost of targeted advertising, information operations, and personalized adversarial messaging — any actor that wants to demographically classify a list of usernames can now do so without infrastructure. The paper's call for safeguards is a recognition that the capability has already arrived; what is missing is the governance layer to constrain its misuse. The privacy-personalization tradeoff this opens is the population-scale version of Does chatbot personalization build trust or expose privacy risks? — but here the user did not even sign up for the inference.


Source: Browsers

Related concepts in this collection

Concept map
13 direct connections · 92 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

web-browsing LLMs can infer demographics from social media usernames alone — privacy assumptions built around API access break when models can browse profiles directly