NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions

Paper · arXiv 2502.13124 · Published February 18, 2025
RL with Verifiable Rewards (RLVR)

Scaling reasoning capabilities beyond traditional domains such as math and coding is hindered by the lack of diverse and high-quality questions. To overcome this limitation, we introduce a scalable approach for generating diverse and challenging reasoning questions, accompanied by reference answers. We present NATURAL- REASONING, a comprehensive dataset comprising 2.8 million questions that span multiple domains, including STEM fields (e.g., Physics, Computer Science), Economics, Social Sciences, and more. We demonstrate the utility of the questions in NATURALREASONING through knowledge distillation experiments which show that NATURALREASONING can effectively elicit and transfer reasoning capabilities from a strong teacher model. Furthermore, we demonstrate that NATURALREASON- ING is also effective for unsupervised self-training using external reward models or self-rewarding. To foster future work, we publicly release NATURALREASONING at https://huggingface.co/datasets/facebook/natural_reasoning.

Introduction. Large language models (LLMs) have demonstrated increased reasoning capabilities [OpenAI et al., 2024, Guo et al., 2025]. These models are designed to devote more time to deliberation before responding, enabling them to tackle intricate tasks and solve more complex problems in science, coding, and mathematics. Such reasoning models are trained via large-scale reinforcement learning on tasks where the reward can be derived using rule-based verification. Existing reasoning datasets are often limited to narrow domains where the solutions are short and easy to verify, while the majority of reasoning problems across broader domains are open-ended reasoning. To bridge this gap, we introduce NATURALREASONING, a comprehensive dataset curated from pretraining corpora, comprising 2.8 million reasoning questions spanning various topics, including Mathematics, Physics, Computer Science, Economics & Business, etc. NATURALREASONING is compared to a wide range of reasoning datasets, showcasing its advantageous properties, in particular its diversity and difficulty.

Discussion / Conclusion. We present NATURALREASONING, a dataset of 2.8 million questions for enhancing LLM reasoning capabilities. Our questions are challenging, requiring more deliberate thinking than existing datasets. The dataset covers diverse reasoning problems across multiple domains including math, physics, computer science, economics, social sciences, etc. Using questions from NATURALREASONING in distillation experiments, we observe consistent improvement on reasoning benchmarks when scaling the data size. We also demonstrate that NATURALREASONING is effective for enabling LLM unsupervised self-training using external reward models or self-rewarding.