Logical Reasoning and Internal Rules

Topic · 32 papers

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
We select two powerful closed-source LLMs for evaluation. o1 model. It is designed to spend more time reasoning before they respond, which can reason through complex tasks and solve harder problems t…
Abductive Reasoning with the GPT-4 Language Model: Case studies from criminal investigation, medical practice, scientific research
By Remo Pareschi, Stake Lab, University of Molise, Campobasso, Italy [https://arxiv.org/abs/2307.10250](https://arxiv.org/abs/2307.10250) “We favor a dialogical approach for several reasons. Firstly…
Aligning Language Models to Explicitly Handle Ambiguity
However, conversational agents built upon even the most recent large language models (LLMs) face challenges in processing ambiguous inputs, primarily due to the following two hurdles: (1) LLMs are not…
An Overview Of Temporal Commonsense Reasoning and Acquisition
“Temporal commonsense reasoning refers to the ability to understand the typical temporal context of phrases, actions, and events, and use it to reason over problems requiring such knowledge. This trai…
CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning
Large Language Models (LLMs) have recently achieved impressive results in complex reasoning tasks through Chain of Thought (CoT) prompting. However, most existing CoT methods rely on using the same pr…
Can LLMs Follow Simple Rules?
[**https://arxiv.org/abs/2311.04235**](https://arxiv.org/abs/2311.04235) Unlike the robots in Isaac Asimov’s fictional universe which stumble into strange, paradoxical situations by following rules t…
Can Large Language Models Understand Context?
Understanding context is key to understanding human language, an ability which Large Language Models (LLMs) have been increasingly seen to demonstrate to an impressive extent. However, though the eval…
Complexity-Based Prompting for Multi-Step Reasoning
which reasoning examples make the most effective prompts. In this work, we propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning. We show that pr…
Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning
Chain-of-Thought (CoT) prompting has been shown to enhance the multi-step reasoning capabilities of Large Language Models (LLMs). However, debates persist about whether LLMs exhibit abstract generaliz…
Do Large Language Models Latently Perform Multi-Hop Reasoning?
We study whether Large Language Models (LLMs) latently perform multi-hop reasoning with complex prompts such as “The mother of the singer of ‘Superstition’ is”. We look for evidence of a latent reason…
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
We evaluate how well Large Language Models (LLMs) latently recall and compose facts to answer multi-hop queries like “In the year Scarlett Johansson was born, the Summer Olympics were hosted in the co…
Do large language models resemble humans in language use?
regularities in language range from phonology to pragmatics. For example, people associate different sounds with different referents (e.g., Köhler, 1929), automatically reinterpret implausible sentenc…
FinCoT: Grounding Chain-of-Thought in Expert Financial Reasoning
This paper presents FinCoT, a structured chain-of- thought (CoT) prompting approach that incorporates insights from domain-specific expert financial reasoning to guide the reasoning traces of large la…
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
Large Language Models (LLMs) have shown remarkable abilities across various language tasks, but solving complex reasoning problems remains a challenge. While existing methods like Chainof- Thought (Co…
How do Transformers Learn Implicit Reasoning?
Recent work suggests that large language models (LLMs) can perform multi-hop reasoning implicitly—producing correct answers without explicitly verbalizing intermediate steps—but the underlying mechani…
Improving Chain-of-Thought Reasoning via Quasi-Symbolic Abstractions
However, explanations generated via CoT are susceptible to content biases that negatively affect their robustness and faithfulness. To mitigate existing limitations, recent work has proposed the use o…
Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs
In LLM reasoning, which poses a greater challenge - deductive or inductive reasoning? While the deductive reasoning capabilities of LLMs, (i.e. their capacity to follow instructions in reasoning tasks…
Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting
“On the diverse and challenging BIG-Bench Hard tasks, we find that Chain-of-Thought prompting performs best on average, but logically invalid Chain-of-Thought prompting is close behind and outperforms…
Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains
However, the quality and transparency of their internal reasoning processes remain underexplored. This work moves beyond the final-answer accuracy and investigates step-by-step reasoning in the medica…
LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory
What does it truly mean for a language model to “reason” strategically, and can scaling up alone guarantee intelligent, context-aware decisions? Strategic decision-making requires adaptive reasoning, …
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners
The emergent few-shot reasoning capabilities of Large Language Models (LLMs) have excited the natural language and machine learning community over recent years. Despite of numerous successful applicat…
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
To address this issue, some studies employ the approach of propositional logic to further enhance logical reasoning abilities of LLMs. However, the potential omissions in the extraction of logical exp…
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models
We conduct detailed analysis with a range of LLMs such as GPT-4, ChatGPT, Gemini, Llama-2, and Mistral using chain-of-thought prompting. Experimental results show that existing LLMs do not fare well o…
Logical Reasoning in Large Language Models: A Survey
With the emergence of advanced reasoning models like OpenAI o3 and DeepSeek-R1, large language models (LLMs) have demonstrated remarkable reasoning capabilities. However, their ability to perform rigo…
Premise Order Matters in Reasoning with Large Language Models
Large language models (LLMs) have accomplished remarkable reasoning performance in various domains. However, in the domain of reasoning tasks, we discover a frailty: LLMs are surprisingly brittle to t…
Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability
these models to be surprisingly strong at performing deductive reasoning over formal logical theories expressed in natural language. A shortcoming of these studies, however, is that they do not take i…
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
Large language models (LLMs) have shown impressive performance on reasoning benchmarks like math and logic. While many works have largely assumed well-defined tasks, real-world queries are often under…
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
The impressive performance of recent language models across a wide range of tasks suggests that they possess a degree of abstract reasoning skills. Are these skills general and transferable, or specia…
Reasoning with Large Language Models, a Survey
in addition to these associative “System 1” tasks, recent advances in Chain-of-thought prompt learning have demonstrated strong “System 2” reasoning abilities, answering a question in the field of art…
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
To that end, we introduce Flexible LENgth Question Answering dataset (FLenQA) 1, a QA dataset for text-based reasoning (§3). For each sample, composed of a True/False question over two pieces of infor…
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
Injecting a collection of symbolic data directly into the training of LLMs can be problematic, as it disregards the synergies among different symbolic families and overlooks the need for a balanced mi…
Universe of Thoughts: Enabling Creative Reasoning with Large Language Models
Reasoning based on Large Language Models (LLMs) has garnered increasing attention due to outstanding performance of these models in mathematical and complex logical tasks. Beginning with the Chain-of-…

No results.