Can AI systems preserve moral value conflicts instead of averaging them?
Current AI systems wash out value tensions through majority aggregation. Can we instead model how values like honesty and friendship genuinely conflict in moral reasoning?
Value pluralism holds that multiple correct values may be held in tension with one another — honesty may conflict with friendship, privacy may conflict with transparency, autonomy may conflict with safety. These tensions are not resolved by choosing a winner; they are irreducible features of moral reasoning.
AI systems, as statistical learners, fit to averages by default. Supervised systems aggregate opinions through majority votes, washing out the very value conflicts that make moral reasoning meaningful. This is not a bug in current systems — it is the default behavior of any system trained to minimize loss across a labeled dataset.
ValuePrism provides a dataset of 218k values, rights, and duties connected to 31k human-written situations. The values are generated by GPT-4 and deemed high-quality by human annotators 91% of the time. Four modeling tasks make pluralism tractable:
- Generation — what values, rights, and duties are relevant for a situation?
- Relevance — is a specific value relevant for this situation? (2-way classification)
- Valence — does the value support or oppose the action, or might it depend? (3-way classification)
- Explanation — how does the value relate to the action? (post-hoc rationale)
The valence task is critical. Disentangling whether a value supports, opposes, or contextually depends is necessary for understanding how plural considerations interact. A value like "respecting autonomy" might support one action and oppose another in the same situation.
Since Should AI alignment target preferences or social role norms?, the value pluralism framework provides a mechanism: rather than aggregating to a single preference or aligning to a universal standard, the system models the full field of relevant values and their interactions. Since Do large language models develop coherent value systems?, value pluralism offers a structural alternative to emergent value coherence — explicit modeling rather than implicit emergence.
Source: Design Frameworks
Related concepts in this collection
-
Should AI alignment target preferences or social role norms?
Current AI alignment approaches optimize for individual or aggregate human preferences. But do preferences actually capture what matters morally, or should alignment instead target the normative standards appropriate to an AI system's specific social role?
pluralism as alternative to both preferentism and universalism
-
Do large language models develop coherent value systems?
This explores whether LLM preferences form internally consistent utility functions that increase in coherence with scale, and whether those systems encode problematic values like self-preservation above human wellbeing despite safety training.
explicit pluralism vs. emergent coherence
-
Can user preferences be learned from just ten questions?
Explores whether adaptive question selection can efficiently infer user-specific reward coefficients without historical data or fine-tuning. This matters for scaling personalization without per-user model updates.
reward factorization could model value trade-offs
-
Can text summaries condition reward models better than embeddings?
Exploring whether learning interpretable text-based summaries of user preferences outperforms embedding vectors for training personalized reward models in language model alignment.
pluralistic alignment operationalized
-
Do personas make language models reason like biased humans?
When LLMs are assigned personas, do they develop the same identity-driven reasoning biases that humans exhibit? And can standard debiasing techniques counteract these effects?
motivated reasoning threatens value pluralism: when models reason through identity-congruent lenses, they cannot hold values in tension because the identity filter pre-selects which values to weight; explicit pluralism modeling would need to counteract the motivated reasoning that collapses plural consideration into identity-congruent preference
-
Can LLMs hold contradictory ethical beliefs and behaviors?
Do language models exhibit artificial hypocrisy when their learned ethical understanding diverges from their trained behavioral constraints? This matters because it reveals whether current AI systems have genuinely integrated values or merely imposed rules.
value pluralism provides a framework for managing prescriptive-descriptive tension: rather than forcing alignment between prescriptive rules and descriptive understanding, pluralism models them as legitimately conflicting values requiring situational navigation
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
value pluralism requires explicitly modeling multiple values in tension rather than aggregating by majority vote