Reasoning Model Architectures
Related topics:
- A Comparative Study on Reasoning Patterns of OpenAI's o1 ModelWe select two powerful closed-source LLMs for evaluation. o1 model. It is designed to spend more time reasoning before they respond, which can reason through complex tasks and solve harder problems t…
- A Decomposition Perspective to Long-context Reasoning for LLMsLong-context reasoning is essential for complex real-world applications, yet remains a significant challenge for Large Language Models (LLMs). Despite the rapid evolution in long-context reasoning, cu…
- Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMsLarge language models (LLMs) are increasingly optimized for long reasoning, under the assumption that more reasoning leads to better performance. However, emerging evidence suggests that longer respon…
- Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized RoutingBalancing performance and efficiency is a central challenge in large language model (LLM) advancement. GPT-5 addresses this with test-time routing, dynamically assigning queries to either an efficient…
- Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate TokensIn this paper, we critically examine that interpretation by investigating how the semantics of intermediate tokens—often anthropomorphized as “thoughts” or reasoning traces and which are claimed to di…
- Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RLRecent advancements in LLM-based agents have demonstrated remarkable capabilities in handling complex, knowledge-intensive tasks by integrating external tools. Among diverse choices of tools, search t…
- Beyond the Last Answer: Your Reasoning Trace Uncovers More than You ThinkLarge Language Models (LLMs) leverage step-by-step reasoning to solve complex problems. Standard evaluation practice involves generating a complete reasoning trace and assessing the correctness of the…
- Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm PerspectiveLarge Language Models (LLMs) have made notable progress in mathematical reasoning, yet they often rely on single-paradigm reasoning that limits their effectiveness across diverse tasks. In this paper,…
- Characterizing Deep Research: A Benchmark and Formal DefinitionInformation tasks such as writing surveys or analytical reports require complex search and reasoning, and have recently been grouped under the umbrella of deep research — a term also adopted by recent…
- DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-ThoughtRecently, O1-like models have emerged as representative examples, illustrating the effectiveness of long chain-of-thought (CoT) in reasoning tasks such as math and coding tasks. In this paper, we intr…
- DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world EnvironmentsLarge Language Models (LLMs) equipped with web search capabilities have demonstrated impressive potential for deep research tasks. However, current approaches predominantly rely on either manually eng…
- Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?Recent progress in reasoning-oriented Large Language Models (LLMs) has been driven by introducing Chain-of-Thought (CoT) traces, where models generate intermediate reasoning traces before producing an…
- Do LLMs Truly Understand When a Precedent Is Overruled?Large language models (LLMs) with extended context windows show promise for complex legal reasoning tasks, yet their ability to understand long legal documents remains insufficiently evaluated. Develo…
- Do Models Explain Themselves? Counterfactual Simulatability of Natural Language ExplanationsLarge language models (LLMs) are trained to imitate humans to explain human decisions. However, do LLMs explain themselves? Can they help humans build mental models of how LLMs process different input…
- Efficient Reasoning with Balanced ThinkingLarge Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, faili…
- Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented GenerationLarge Language Models (LLMs) have demonstrated significant performance improvements across various cognitive tasks. An emerging application is using LLMs to enhance retrieval-augmented generation (RAG…
- FlowReasoner: Reinforcing Query-Level Meta-AgentsThis paper proposes a query-level meta-agent named FLOWREASONER to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-…
- From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning AgentsHowever, traditional keyword-based search engines are increasingly inadequate for handling complex, multi-step information needs. Our position is that Large Language Models (LLMs), endowed with reason…
- Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic CapabilitiesSparse MoE models activate a subset of model parameters per input token by learning to dynamically route tokens to a subset of parameters (experts); this allows them to decouple total model capacity f…
- HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web SearchesRecently, large reasoning models have demonstrated strong mathematical and coding abilities, and deep search leverages their reasoning capabilities in challenging information retrieval tasks. Existing…
- HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMsThe breakthrough of OpenAI o1 highlights the potential of enhancing reasoning to improve LLM. Yet, most research in reasoning has focused on mathematical tasks, leaving domains like medicine underexpl…
- Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge DistillationSince smaller language models (SLMs) are computationally more efficient but often under-perform compared to larger models, Knowledge Distillation (KD) methods allow for finetuning these smaller models…
- J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement LearningThe progress of AI is bottlenecked by the quality of evaluation, and powerful LLM-as-a-Judge models have proved to be a core solution. Improved judgment ability is enabled by stronger chain-of-thought…
- Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and OpportunitiesHowever, LLM-based QA struggles with complex QA tasks due to poor reasoning capacity, outdated knowledge, and hallucinations. Several recent works synthesize LLMs and knowledge graphs (KGs) for QA to …
- Large Language Models Think Too Fast To Explore Effectivelywhether LLMs can surpass humans in exploration during an open-ended task, using Little Alchemy 2 as a paradigm, where agents combine elements to discover new ones. Results show most LLMs underperform …
- Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMsHowever, existing methods overlook the trade-off between reasoning effectiveness and computational efficiency, often encouraging unnecessarily long reasoning chains and wasting tokens. To address this…
- Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth TransformersLarge language models (LLMs) (Brown et al., 2020) are known to acquire substantial factual knowledge during pretraining, storing it in their parameters (Geva et al., 2023). However, how effectively th…
- Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise BehaviorsLarge language models (LLMs) now solve multi-step problems by emitting extended chains of thought. During the process, they often re-derive the same intermediate steps across problems, inflating token…
- Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans WorseChain-of-thought (Wei et al., 2022; Nye et al., 2021) is a widely used prompting technique for large language and multimodal models (LLMs and LMMs), instructing models to “think step-by-step” or provi…
- Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?We find that the response length of reasoning LLMs, whether trained by reinforcement learning or supervised learning, drastically increases for ill-posed questions with missing premises (MiP), ending …
- Monitoring Reasoning Models for Misbehavior and the Risks of Promoting ObfuscationMitigating reward hacking—where AI systems misbehave due to flaws or misspecifications in their learning objectives—remains a key challenge in constructing capable and aligned models. We show that we …
- OpenThoughts: Data Recipes for Reasoning Modelsposttraining process equips these models with the ability to output long chains of thought, or "thinking tokens," during inference time, which can guide the model toward the correct answer. Yet, the c…
- Performative Thinking? The Brittle Correlation Between CoT Length and Problem ComplexityIntermediate token generation (ITG), where a model produces output before the solution, has been proposed as a method to improve the performance of language models on reasoning tasks. While these reas…
- Query Rewriting for Retrieval-Augmented Large Language ModelsLarge Language Models (LLMs) play powerful, black-box readers in the retrieve-thenread pipeline, making remarkable progress in knowledge-intensive tasks. This work introduces a new framework, Rewrite-…
- R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement LearningExisting Large Reasoning Models (LRMs) have shown the potential of reinforcement learning (RL) to enhance the complex reasoning capabilities of Large Language Models (LLMs). While they achieve remarka…
- RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught ReasonerThe reasoning abilities of large language models (LLMs) have improved with chain-of-thought (CoT) prompting, allowing models to solve complex tasks in a stepwise manner. However, training CoT capabili…
- Reasoning LLMs are Wandering Solution ExplorersHowever, we argue that current reasoning LLMs (RLLMs) lack the ability to systematically explore the solution space. This paper formalizes what constitutes systematic problem solving and identifies co…
- Reasoning Models Don't Always Say What They ThinkChain-of-thought (CoT) offers a potential boon for AI safety as it allows monitoring a model’s CoT to try to understand its intentions and reasoning processes. However, the effectiveness of such monit…
- Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct ReasoningTest-time scaling, which is also often referred to as slow-thinking, has been demonstrated to enhance multi-step reasoning in large language models (LLMs). However, despite its widespread utilization,…
- SSRL: Self-Search Reinforcement LearningWe investigate the potential of large language models (LLMs) to serve as efficient simulators for agentic search tasks in reinforcement learning (RL), thereby reducing dependence on costly interaction…
- Search Arena: Analyzing Search-Augmented LLMsSearch-augmented language models combine web search with Large Language Models (LLMs) to improve response groundedness and freshness. However, analyzing these systems remains challenging: existing dat…
- Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement LearningEfficiently acquiring external knowledge and up-to-date information is essential for effective reasoning and text generation in large language models (LLMs). Prompting advanced LLMs with reasoning cap…
- Search-o1: Agentic Search-Enhanced Large Reasoning ModelsLarge reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, their extended reasoning processes o…
- Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!Intermediate token generation (ITG), where a model produces output before the solution, has been proposed as a method to improve the performance of language models on reasoning tasks. These intermedia…
- The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMsDoes continued scaling of large language models (LLMs) yield diminishing returns? Real-world value often stems from the length of task an agent can complete. We start this work by observing the simple…
- Thinkless: LLM Learns When to ThinkReasoning Language Models, capable of extended chain-of-thought reasoning, have demonstrated remarkable performance on tasks requiring complex logical inference. However, applying elaborate reasoning …
- Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMsLarge language models (LLMs) such as OpenAI’s o1 have demonstrated remarkable abilities in complex reasoning tasks by scaling test-time compute and exhibiting humanlike deep thinking. However, we iden…
- Towards a Deeper Understanding of Reasoning Capabilities in Large Language ModelsAbstract. While large language models demonstrate impressive performance on static benchmarks, the true potential of large language models as self-learning and reasoning agents in dynamic environments…
- TreeRL: LLM Reinforcement Learning with On-Policy Tree SearchCompared to conventional independent chain sampling strategies with outcome supervision, tree search enables better exploration of the reasoning space and provides dense, on-policy process rewards dur…
- When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMsReasoning-enhanced large language models (RLLMs), whether explicitly trained for reasoning or prompted via chain-of-thought (CoT), have achieved state-of-the- art performance on many complex reasoning…
- ZeroSearch: Incentivize the Search Capability of LLMs without SearchingEffective information searching is essential for enhancing the reasoning and generation capabilities of large language models (LLMs). Recent research has explored using reinforcement learning (RL) to …