Can chain-of-thought reasoning emerge during pretraining itself?
Does treating reasoning as an exploratory action within the pretraining phase, rather than post-training, allow models to develop stronger reasoning capabilities earlier? This matters because it could reshape when and how we train reasoning into language models.