Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners

Paper · arXiv 2305.14825 · Published May 24, 2023

The emergent few-shot reasoning capabilities of Large Language Models (LLMs) have excited the natural language and machine learning community over recent years. Despite of numerous successful applications, the underlying mechanism of such in-context capabilities still remains unclear. In this work, we hypothesize that the learned semantics of language tokens do the most heavy lifting during the reasoning process. Different from human’s symbolic reasoning process, the semantic representations of LLMs could create strong connections among tokens, thus composing a superficial logical chain. To test our hypothesis, we decouple semantics from the language reasoning process and evaluate three kinds of reasoning abilities, i.e., deduction, induction and abduction. Our findings reveal that semantics play a vital role in LLMs’ in-context reasoning—LLMs perform significantly better when semantics are consistent with commonsense but struggle to solve symbolic or counter-commonsense reasoning tasks by leveraging in-context new knowledge. The surprising observations question whether modern LLMs have mastered the inductive, deductive and abductive reasoning abilities as in human intelligence, and motivate research on unveiling the magic existing within the black-box LLMs. On the whole, our analysis provides a novel perspective on the role of semantics in developing and evaluating language models’ reasoning abilities.

Despite the powerful and versatile in-context learning ability of LLMs, the underlying mechanisms by which they operate within a given context still remain unclear. Previous works investigate which aspects of the given examples contribute to the final task performance, including ground-truth labels and example ordering [7–9]. Another line of recent work has focused on explaining and leveraging the in-context learning (ICL) mechanism [10–13]. However, the basic problem they have in common is that the in-context prompts they input are based on natural language queries to investigate the reasoning abilities of LLMs. However, according to the Dual Process Theory [14, 15], humans usually use symbolic reasoning with System II to solve complex logical reasoning problems. To fill the research gap, we systematically study the in-context reasoning ability of LLMs by decoupling the semantics from the language reasoning process. With extensive experiments, we aim to answer the following research question: Are LLMs good in-context reasoners without semantics?

Our paper presents the first comprehensive investigation of the role of semantics in LLMs’ in-context reasoning abilities by decoupling semantics from in-context prompts. Experimental results suggest that: When semantics are consistent with commonsense, LLMs perform fairly well; when semantics are decoupled or counter-commonsense, LLMs struggle to solve the reasoning tasks by leveraging in-context new knowledge. These findings reveal the importance of semantics in LLMs’ reasoning abilities and inspire further research on unveiling the magic existing within the black-box LLMs. In light of the findings identified in our analysis, we point out several potential future directions for the development of large foundation models:

More complex symbolic reasoning benchmark: To improve LLMs’ in-context symbolic reasoning abilities, developing new datasets with decoupled semantics and more complex reasoning tasks is necessary. These benchmarks should challenge LLMs with diverse and intricate symbolic knowledge.

Combination with external non-parametric knowledge base: As our experimental results show, the memorization abilities of LLMs are not comparable to existing graph-based methods. This motivates integrating LLMs with external non-parametric knowledge bases, such as graph databases, to enhance their knowledge insertion and updating. This hybrid approach can leverage the strengths of LLMs’ language understanding and the comprehensive, accurate and up-to-date knowledge stored in non-parametric sources.

Improving the ability of processing in-context knowledge: More robust and strong abilities to process and memorize in-context knowledge is crucial to perform complex in-context reasoning tasks. Further research is needed to improve LLMs’ capabilities in processing and leveraging in-context knowledge. This includes developing mechanisms to better encode and retrieve relevant information from the in-context knowledge, in order to enable more effective reasoning.

we investigate the impact of removing given logical rules (in deduction) and facts (in induction), where LLMs have to rely solely on the prior commonsense knowledge stored within the parameters to infer the answers. This analysis allows us to assess the extent to which LLMs can leverage their internal knowledge to reason effectively without explicit in-context knowledge. Second, we retain the semantics of the datasets but introduce counter-commonsense logical rules. This requires LLMs to leverage in-context new knowledge and navigate the reasoning process by strictly adhering to the new information conflicting with the old knowledge. We implement it by shuffling relations as new relation labels to construct a new counter-commonsense dataset. For instance, we replace “motherOf” with “sisterOf”, “parentOf” with “brotherOf”, and “female” with “male”.