Chain-of-Thought and Reasoning Methods

Topic · 66 papers

A Survey of Calibration Process for Black-Box LLMs
In recent years, Confidence Estimation and Calibration have frequently been discussed together, as the estimation of confidence is often influenced by the uncertainty in the model or data, and calibra…
A Survey on Diffusion Language Models
Abstract—Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterati…
Activation Steering for Chain-of-Thought Compression
Large language models (LLMs) excel at complex reasoning when they include intermediate steps, known as chains of thought (CoTs). However, these rationales are often overly verbose, even for simple pro…
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
Bilgehan Sel, Ahmad Al-Tawaha, Vanshaj Khattar, Lu Wang, Ruoxi Jia and Ming Jin Virginia Tech, Microsoft “Current literature, aiming to surpass the “Chain-of-Thought” approach, often resorts to an ex…
Answering Questions by Meta-Reasoning over Multiple Chains of Thought
Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of thought (CoT), before arriving at a final answer. Often, multiple cha…
Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data
“The recent success in large language models (LLMs) has shown that properly prompted LLMs demonstrate emergent capabilities on complex understanding and question-answering tasks (Wei et al., 2022a). E…
Base Models Know How to Reason, Thinking Models Learn When
Why do thinking language models like DeepSeek R1 outperform their base counterparts? Despite consistent performance gains, it remains unclear to what extent thinking models learn entirely new reasonin…
Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
Existing research indicates that the output of Chain-of-Thought (CoT) is significantly affected by input perturbations. Although many methods aim to mitigate such impact by optimizing prompts, a theor…
Break the Chain: Large Language Models Can be Shortcut Reasoners
Despite its benefits, CoT is also critiqued for its substantial token usage, as it explores numerous reasoning pathways before arriving at a conclusive answer. This characteristic is particularly prom…
Chain of Draft: Thinking Faster by Writing Less
In this work, we propose Chain of Draft (CoD), a novel paradigm inspired by human cognitive processes, where LLMs generate minimalistic yet informative intermediate reasoning outputs while solving tas…
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective
Large Language Models (LLMs) have made notable progress in mathematical reasoning, yet they often rely on single-paradigm reasoning that limits their effectiveness across diverse tasks. In this paper,…
Chain-of-Retrieval Augmented Generation
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer. Conventional RAG methods usually p…
Chain-of-Thought Is Not Explainability
we argue that CoT rationales can be misleading and are neither necessary nor sufficient for trustworthy interpretability By analysing faithfulness in terms of whether CoTs are not only human-interpret…
Chain-of-Thought Reasoning Without Prompting
In enhancing the reasoning capabilities of large language models (LLMs), prior research primarily focuses on specific prompting techniques such as few-shot or zero-shot chain-of-thought (CoT) promptin…
Chain-of-thought Reasoning Is A Policy Improvement Operator
“A major challenge that has prevented past efforts of self-learning in language models from succeeding, especially in arithmetic, is a phenomenon that we call error avalanching. During self-training, …
CoT is Not True Reasoning, It Is Just a Tight Constraint to Imitate: A Theory Perspective
Chain-of-Thought (CoT) does not elicit genuine, abstract reasoning. Instead, we argue that Chain-of- Thought (CoT) functions as a powerful structural constraint that guides Large Language Models (LLMs…
ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning
Narrative comprehension on long stories and novels has been a challenging domain attributed to their intricate plotlines and entangled, often evolving relations among characters and entities. Given th…
Competitive Programming with Large Reasoning Models
Unlike o1-ioi or AlphaCode, o3 does not depend on coding-specific test-time strategies defined by humans. Instead, we found that complex test-time reasoning strategies emerged naturally from end-to-en…
Compositional Reasoning with Transformers, RNNs, and Chain of Thought
Large language models [Touvron et al., 2023, Anil et al., 2023, Achiam et al., 2023] are increasingly used to perform logical reasoning and other problems that require algorithmic thinking. To underst…
Cumulative Reasoning with Large Language Models
Despite the recent advancements in language models (LMs), their ability to solve complex problems remains limited. This paper introduces Cumulative Reasoning (CR), a novel approach that utilizes LMs c…
Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory
While large language models (LLMs) leverage both knowledge and reasoning during inference, the capacity to distinguish between them plays a pivotal role in model analysis, interpretability, and develo…
Demystifying Chains, Trees, and Graphs of Thoughts
The field of natural language processing (NLP) has witnessed significant progress in recent years, with a notable focus on improving large language models’ (LLM) performance through innovative prompti…
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
Large reasoning models (LRMs) have demonstrated impressive capabilities in complex problem-solving, yet their internal reasoning mechanisms remain poorly understood. In this paper, we investigate the …
Diagnostic Reasoning Prompts Reveal the Potential for Large Language Model Interpretability in Medicine
“Clinical reasoning is a set of problem-solving processes specifically designed for diagnosis and management of a patient’s medical condition. Commonly used diagnostic techniques include differential …
Do Large Language Models Reason Causally Like Us? Even Better?
Indeed, a growing number of researchers have proposed that current LLMs are unable to generalize causal ideas beyond their training distribution and/or without strong user-induced guidance (e.g., chai…
Efficient Reasoning with Balanced Thinking
Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, faili…
Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy
Large language models (LLMs) have demonstrated remarkable proficiency in generating detailed and coherent explanations of complex concepts. However, the extent to which these models truly comprehend t…
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
Large Language Models (LLMs) have shown remarkable abilities across various language tasks, but solving complex reasoning problems remains a challenge. While existing methods like Chainof- Thought (Co…
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
When leveraging language models for reasoning tasks, generating explicit chainof- thought (CoT) steps often proves essential for achieving high accuracy in final outputs. In this paper, we investigate…
From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?
While existing benchmarks probe the reasoning abilities of large language models (LLMs) across diverse domains, they predominantly assess passive reasoning, providing models with all the information n…
Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis
The emergence of reasoning models and their integration into practical AI chat bots has led to breakthroughs in solving advanced math, deep search, and extractive question answering problems that requ…
Implicit Chain of Thought Reasoning via Knowledge Distillation
To augment languagemodelswith the ability to reason, researchers usually prompt or finetune them to produce chain of thought reasoning steps before producing the final answer. However, although people…
Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning
Large Language Models (LLMs) were shown to struggle with long-term planning, which may be caused by the limited way in which they explore the space of possible solutions. We propose an architecture wh…
Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation
Since smaller language models (SLMs) are computationally more efficient but often under-perform compared to larger models, Knowledge Distillation (KD) methods allow for finetuning these smaller models…
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
We present Klear-Reasoner, a model with long reasoning capabilities that demonstrates careful deliberation during problem solving, achieving outstanding performance across multiple benchmarks. Althoug…
LLM Reasoning Is Latent, Not the Chain of Thought
This position paper argues that large language model (LLM) reasoning should be studied as latent-state trajectory formation rather than as faithful surface chainof- thought (CoT). This matters because…
LR^2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems
Recent progress in Large Reasoning Models (LRMs) has significantly enhanced the reasoning abilities of Large Language Models (LLMs), empowering them to tackle increasingly complex tasks through reflec…
Language models show human-like content effects on reasoning tasks
Abstract reasoning is a key ability for an intelligent system. Large language models (LMs) achieve above-chance performance on abstract reasoning tasks, but exhibit many imperfections. However, human …
Large Language Model Guided Tree-of-Thought
“Fields Medal winner Terence Tao once shared his experiences solving hard math problems1: “When I was a kid, I had a romanticized notion of mathematics, that hard problems were solved in Eureka moment…
Latent Skill Discovery for Chain-of-Thought Reasoning
Recent advances in Large Language Models (LLMs) have led to an emergent ability of chain-of-thought (CoT) prompting, a prompt reasoning strategy that adds intermediate rationale steps between question…
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering
Answering questions that require multi-hop reasoning at web-scale necessitates retrieving multiple evidence documents, one of which often has little lexical or semantic relationship to the question. T…
Least-to-most Prompting Enables Complex Reasoning In Large Language Models
“However, chain-of-thought prompting has a key limitation—it often performs poorly on tasks that require generalization of solving problems harder than the demonstration examples, such as compositiona…
Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones
Inference-time computation has emerged as a promising scaling axis for improving large language model reasoning. However, despite yielding impressive performance, the optimal allocation of inference-t…
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
To address this issue, some studies employ the approach of propositional logic to further enhance logical reasoning abilities of LLMs. However, the potential omissions in the extraction of logical exp…
Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning
Large Language Models (LLMs) have demonstrated significant improvements in reasoning capabilities through supervised fine-tuning and reinforcement learning. However, when training reasoning models, th…
Multi-hop Question Answering via Reasoning Chains
Multi-hop question answering requires models to gather information from different parts of a text to answer a question. Most current approaches learn to address this task in an end-to-end way with neu…
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Recently, there has been significant progress in teaching language models to perform step-by-step reasoning to solve complex numerical reasoning tasks. Chain-of-thoughts prompting (CoT) is the state-o…
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Process Reward Models (PRMs) have recently emerged as a powerful framework for supervising intermediate reasoning steps in large language models (LLMs). Previous PRMs are primarily trained on model fi…
Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models
Chain-of-Thought (CoT) prompting has improved the reasoning performance of large language models (LLMs), but it remains unclear why it works and whether it is the unique mechanism for triggering reaso…
Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought
We provide evidence of performative chain-ofthought (CoT) in reasoning models, where a model becomes strongly confident in its final answer, but continues generating tokens without revealing its inter…
Reasoning to Learn from Latent Thoughts
Human-written text is the culmination of an underlying thought process—when we write, there is often an internal dialogue that clarifies or even determines the written word. However, modern language m…
Self-consistency Improves Chain Of Thought Reasoning In Language Models
“Although language models have demonstrated remarkable success across a range of NLP tasks, their ability to demonstrate reasoning is often seen as a limitation, which cannot be overcome solely by inc…
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
Human cognition typically involves thinking through abstract, fluid concepts rather than strictly using discrete linguistic tokens. Current reasoning models, however, are constrained to reasoning with…
Soft Tokens, Hard Truths
Large Language Models (LLMs) have achieved impressive success across a wide range of reasoning tasks, particularly when enhanced with Chain-of-Thought (CoT) prompting, where models generate intermedia…
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs
Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks by generating intermediate reasoning steps. However, most existing approaches focus on hard token…
Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!
Intermediate token generation (ITG), where a model produces output before the solution, has been proposed as a method to improve the performance of language models on reasoning tasks. These intermedia…
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have further improved …
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
Long chain-of-thought (CoT) is an essential ingredient in effective usage of modern large language models, but our understanding of the reasoning strategies underlying these capabilities remains limit…
Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models
Large language models (LLMs) have shown strong performance across natural language reasoning tasks, yet their reasoning processes remain brittle and difficult to interpret. Prompting techniques like C…
Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection
Self-detection for Large Language Models (LLMs) seeks to evaluate the trustworthiness of the LLM’s output by leveraging its own capabilities, thereby alleviating the issue of output hallucination. How…
Thought Anchors: Which LLM Reasoning Steps Matter?
We argue that analyzing reasoning traces at the sentence level is a promising approach to understanding reasoning processes. We present three complementary attribution methods: (1) a black-box method …
Tina: Tiny Reasoning Models via LoRA
How cost-effectively can strong reasoning abilities be achieved in language models? Driven by this fundamental question, we present Tina, a family of tiny reasoning models achieved with high cost-effi…
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs). But for what kinds of tasks is this extra “thinking” really helpful?…
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Chain-of-Thought (CoT) is an efficient prompting method that enables the reasoning ability of large language models by augmenting the query using multiple examples with multiple intermediate steps. De…
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
“Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inferen…
Zero-Shot Verification-guided Chain of Thoughts
(P1) Base: Base represents the baseline zero-shot prompting style for question answering that we use. Given a question, such as “What is 2+2?", the prompt to elicit an answer would look like: “User: Q…

No results.