Psychology and Social Cognition

Why does alignment research ignore how humans adapt to AI?

Current alignment work focuses on making AI obey human values, but what about helping humans understand and effectively use increasingly capable AI systems? This explores whether neglecting human adaptation creates new risks.

Note · 2026-02-23 · sourced from Alignment
What kind of thing is an LLM really? How do people come to trust conversational AI systems? How should researchers navigate LLM reasoning research?

A systematic review of 400+ papers across HCI, NLP, and ML reveals a significant gap: alignment research overwhelmingly focuses on aligning AI with humans, while the reciprocal direction — aligning humans with AI — receives minimal attention. The bidirectional framework proposes both as interconnected feedback loops.

"Aligning AI with Humans" covers the familiar territory: integrating human specifications into training, steering, and customizing AI behavior. "Aligning Humans with AI" is the underexplored axis: supporting human agency, empowering critical thinking when using AI, enabling effective collaboration, and adapting societal approaches to maximize benefits.

Three persistent challenges frame why bidirectional alignment matters:

  1. Specification gaming — AI optimizes proxies (human approval) rather than intended values, making seemingly correct decisions for wrong reasons. One-directional alignment doesn't address the human side: users who can't detect specification gaming are vulnerable to it.

  2. Scalable oversight — as AI complexity grows, evaluating behavior becomes infeasible through human feedback alone. Aligning humans with AI means building human capacity to oversee increasingly capable systems.

  3. Dynamic nature — alignment must adapt to evolving human values AND evolving AI capabilities. Without considering long-term cognitive and social impacts of AI use, alignment becomes a moving target that static one-directional approaches cannot track.

This connects to Does incremental AI replacement erode human influence over society?. Gradual disempowerment is what happens when the human-to-AI direction is neglected: humans lose the capacity to oversee and direct AI, not through any dramatic failure but through incremental capability erosion. Bidirectional alignment is the explicit countermeasure.

The framework also complements What breaks when humans and AI models misunderstand each other?. MToM addresses the cognitive layer of bidirectional alignment — how humans and AI build models of each other. The bidirectional alignment framework adds behavioral and societal layers.


Source: Alignment

Related concepts in this collection

Concept map
12 direct connections · 90 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

bidirectional human-AI alignment reframes alignment as reciprocal — aligning humans with AI is the underexplored dimension