Competitive Programming with Large Reasoning Models
Unlike o1-ioi or AlphaCode, o3 does not depend on coding-specific test-time strategies defined by humans. Instead, we found that complex test-time reasoning strategies emerged naturally from end-to-end RL, leading to unprecedented performance on competitive programming benchmarks.
6 Conclusion
Through the o-series large reasoning models, we demonstrate that chain-of-thought reasoning is a powerful strategy for improving performance in coding tasks, from competitive programming benchmarks such as CodeForces and IOI to complex software engineering challenges like SWE-bench and Astra. Our findings highlight that increasing reinforcement learning training compute, coupled with enhanced test-time compute, consistently boosts model performance to nearly match the best humans in the world. Given these results, we believe o-series large reasoning models will unlock many new use cases for AI in science, coding, math, and many other fields.