Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Personalising LLMs through micro-level preference learning processes may result in models that are better aligned with each user. However, there are several normative challenges in defining the bounds of a societally-acceptable and safe degree of personalisation. In this paper, we ask how, and in what ways, LLMs should be personalised. First, we review literature on current paradigms for aligning LLMs with human feedback, and identify issues including (i) a lack of clarity regarding what alignment means; (ii) a tendency of technology providers to prescribe definitions of inherently subjective preferences and values; and (iii) a “tyranny of the crowdworker”, exacerbated by a lack of documentation in who we are really aligning to. Second, we present a taxonomy of benefits and risks associated with personalised LLMs, for individuals and society at large. Finally, we propose a three-tiered policy framework that allows users to experience the benefits of personalised alignment, while restraining unsafe and undesirable LLM-behaviours within (supra-)national and organisational bounds.
We condition an LLM with a persona profile, which can include information ranging from demographic data and socio-behavioral indicators to free-form user descriptions, as well as any other pertinent details that could enrich the persona profile. Using this conditioned persona, we task the LLM with selecting the preferred response to a subjective question in a binary choice setting, aiming to reflect the preferences that the persona would likely have. As is done in Zheng et al. (2023), we also consider a setting where a “tie” option is allowed. As highlighted in Section 2, persona sparsity can lead to instances where an LLM struggles to assess certain questions accurately. However, we hypothesize that the LLM possesses some notion of its uncertainty in these instances. Therefore, we instruct the LLM to estimate the certainty in its answer.