100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models

Paper · arXiv 2505.00551 · Published May 1, 2025

Therefore, several replication studies have explored strategies for efficiently creating training datasets by leveraging open-source data and powerful models. In this subsection, we introduce the datasets used in RLVR. These datasets cover various tasks that are verifiable during RL training, in which we mainly focus on datasets for math and coding problem solving. We introduce the curation of each dataset, including the selection of data resources, the construction of verified questions and answers, and the detailed pre-processing procedures. Table 3 displays an overview for the statistics of these datasets.

No results.