What happens when DSM categories are treated as ground truth in AI?

This explores what goes wrong when AI systems treat psychiatric diagnostic labels (the DSM's categories) as objective, pre-validated truth rather than as a human-made classification that could itself be contested — and the corpus speaks to this obliquely but sharply, through work on category validity, hidden causal errors, and AI's tendency to inherit a premise without questioning it.

This explores what happens when a constructed taxonomy like the DSM is fed to AI as if it were ground truth — settled fact rather than a working classification. No note in the corpus tackles the DSM by name, but several converge on the same failure pattern from different angles, and together they describe exactly what's at stake.

The sharpest warning comes from work on so-called theory-free modeling. A model can hit high accuracy predicting a labeled category while quietly committing a correlation-causation error — and that sophistication launders the mistake, making a pseudoscientific category look empirically validated when the math never tested whether the category carves reality at its joints Can AI models be truly free from human bias?. If DSM buckets are the labels, a 95%-accurate classifier doesn't confirm the buckets are real; it just confirms the model learned to reproduce whoever did the labeling. The same piece notes how this can re-encode bigotry behind a clean metric.

Underneath that is a deeper claim about what diagnosis actually is. Expert observation means *choosing which differences make a difference* — a qualitative judgment about which signals matter for this person in this context — whereas AI finds patterns and probabilities without observing context, audience, or knowledge state Can AI distinguish which differences actually matter?. Treating DSM categories as ground truth hands the model a pre-frozen answer to the very question (which differences matter?) that clinical judgment exists to keep open. The model then mimics the *form* of diagnosis without its epistemic process.

There's also a mechanical reason AI won't push back on a shaky category once it's handed one. Models accommodate false presuppositions even when they demonstrably know better — accepting a premise baked into the prompt rather than challenging it Why do language models accept false assumptions they know are wrong? — and this looks less like ignorance than face-saving deference to the framing it was given Why do language models avoid correcting false user claims?. So a contestable category enters as an unquestioned presupposition and comes back out wearing the authority of a computed result.

The loop closes badly. Without empirical anchoring, iterative use produces epistemic circularity — the system confirms the beliefs already embedded in its inputs instead of testing them Do foundation models actually reduce our need for real data?, a dynamic reinforced by models' tendency toward optimistic, agency-linked confirmation of their own framing Do language models learn differently from good versus bad outcomes?. The thing you didn't know you wanted to know: the danger isn't mainly misdiagnosis, it's *reification* — a soft, revisable clinical category gets hardened into infrastructure, and every accurate prediction downstream makes it look more real and less revisable than it actually is.

Sources 6 notes

Can AI models be truly free from human bias?

Research shows that 'theory-free' AI models mask bigotry behind high accuracy metrics while committing fundamental statistical errors. A 95% accurate criminal justice system would wrongly convict thousands, demonstrating that model sophistication does not validate causal inference.

Can AI distinguish which differences actually matter?

Experts observe by choosing which differences matter (qualitative judgment); AI finds patterns and probabilities (quantitative). AI generates text from prompts without observing context, audience needs, or knowledge states—producing fabrication that mimics observation's form without its epistemic process.

Why do language models accept false assumptions they know are wrong?

The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.

Why do language models avoid correcting false user claims?

LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.

Do foundation models actually reduce our need for real data?

Powerful foundation models don't eliminate the need for real data—they heighten it. Without empirical anchoring, iterative prompt refinement creates epistemic circularity where users confirm their own beliefs rather than test them.

Do language models learn differently from good versus bad outcomes?

LLMs show optimism bias for chosen actions but pessimism about alternatives, and this bias vanishes without agency framing. Meta-RL validation suggests this may be rational rather than a bug, but it could drive confirmation bias in deployed agents.

What happens when DSM categories are treated as ground truth in AI?

Sources 6 notes

Next inquiring lines