When should human values enter the LLM development pipeline?
Explores whether human-centered concerns like safety and fairness work better as early design principles throughout development, or as post-training alignment patches. Matters because pipeline placement determines whether human priorities shape the foundation or fight against it.
The dominant industry pattern treats human-centered concerns — safety, fairness, steerability, user values — as alignment problems handled in a "cursory post-training stage," downstream of the real work of capability scaling. The Human-Centered Large Language Models framework rejects this sequencing. It argues that human priorities must be embedded with rigor at every stage of the pipeline: data sourcing and filtering, model training, evaluation, deployment, and long-term maintenance. The distinction it draws is between post-hoc human factors design, which accounts for user needs in only a thin slice of the process, and genuine human-centered design, where stakeholders are central to ideating, building, evaluating, and deploying the system.
Why the placement matters: a value introduced only at post-training inherits whatever the pretraining data and objective already baked in, so the patch is forever fighting the foundation. If the data sourcing stage ignored privacy or representational harm, alignment cannot fully recover it; if evaluation optimizes leaderboard metrics, human flourishing is invisible to the gradient. Embedding objectives upstream means treating the LLM as a sociotechnical system with global influence rather than an isolated tool measured by static benchmarks. The counterpoint — that pipeline-wide human-centering is slower and harder to operationalize than a final-stage fix — is real, and the framework concedes that the optimal path resists universal solutions. But the analysis is that treating alignment as a patch is precisely what subordinates human concerns to the capability race. The architecture of the pipeline encodes the priority.
— "Reflections and New Directions for Human-Centered Large Language Models", https://arxiv.org/abs/2605.06901
Related concepts in this collection
-
Should AI alignment target preferences or social role norms?
Current AI alignment approaches optimize for individual or aggregate human preferences. But do preferences actually capture what matters morally, or should alignment instead target the normative standards appropriate to an AI system's specific social role?
supplies the normative target that pipeline-wide human-centering should aim at
-
Can human-centered LLM design ever achieve universal solutions?
If harm and benefit depend on who you ask and how you measure them, can we design LLM systems that satisfy all stakeholders? This explores why broad values like safety and justice resist one-size-fits-all implementation.
the companion finding from the same framework: even granting pipeline-wide embedding, the objectives underdetermine the gradient because harm has no stakeholder-neutral operationalization
-
Can AI systems preserve moral value conflicts instead of averaging them?
Current AI systems wash out value tensions through majority aggregation. Can we instead model how values like honesty and friendship genuinely conflict in moral reasoning?
concretizes what an upstream, non-patch human-centering stage must do: carry conflicting values forward through the pipeline instead of collapsing them late
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
human-centered objectives must be embedded across the entire llm pipeline not bolted on as a post-training patch