OpenAssistant Conversations - Democratizing Large Language Model Alignment

Paper · arXiv 2304.07327 · Published April 14, 2023
Alignment

In an effort to democratize research on large-scale alignment, we release OpenAssistant Conversations, a human-generated, human-annotated assistant-style conversation corpus consisting of 161,443 messages in 35 different languages, annotated with 461,292 quality ratings, resulting in over 10,000 complete and fully annotated conversation trees. The corpus is a product of a worldwide crowd-sourcing effort involving over 13,500 volunteers.

The proposed dataset consists of a list of conversations. The basic data structure is a Conversation Tree (CT), with nodes representing written messages in a conversation. A CT’s root node represents an initial prompt, given by the prompter. To avoid confusion, we call the roles of the conversation prompter and assistant. This allows us to reserve the term user for the human contributors. Although our efforts focus largely on human contributions, both the prompter and assistant roles can, in principle, be fulfilled by either a human user or a machine. Every tree node is labelled by its role, and can have multiple children of the opposite role, each of which represents a separate next step in the conversation. A path from the root to any node in the tree (including to itself) is called a thread, and it represents a valid conversation with the prompter and the assistant taking turns. Tree nodes are annotated with additional data such as user-provided labels and metadata, such as collection timestamp and indicated language. Each assistant node further has a rank associated which orders it compared to replies of the parent prompt, according to user preferences.

Reply as prompter. The task of replying as a prompter, on the other hand, does not impose strict quality requirements but instead emphasizes on the importance of diversity to accommodate various use-cases. Examples of prompter replies may include asking for clarification, modifying the original intent, posing a follow-up question, or changing the direction of the conversation altogether.

Label a prompt or reply. Users are presented with a message from the database along with the preceding conversation thread (if available) and are asked to categorize the message according to three dimensions: spam detection, guideline adherence, and quality.