A Survey of Reinforcement Learning from Human Feedback

Paper · arXiv 2312.14925 · Published December 22, 2023

This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between RL agents and human input. While recent focus has been on RLHF for LLMs, our survey adopts a broader perspective, examining the diverse applications and wide-ranging impact of the technique. We delve into the core principles that underpin RLHF, shedding light on the symbiotic relationship between algorithms and human feedback, and discuss the main research trends in the field.

In the literature focusing on theoretical results, there is a distinction (similar to the distinction made in standard RL) between an offline and online setting. In the former, learning is based on a given fixed data set, usually previously collected through an interaction with the environment. In contrast, in the online environment, one interacts directly with the environment to learn from real-time feedback and continuously updates one’s strategies based on the feedback received, allowing the agent to learn and adapt as it engages with the environment. Accordingly, an important component of the online variant is the sampling procedure, i.e., how the labels are selected. This is usually accomplished using an acquisition function that is based on uncertainty (see Section 4.1.1).