Teaching Probabilistic Logical Reasoning to Transformers

Paper · arXiv 2305.13179 · Published May 22, 2023

We propose a novel end-to-end fine-tuning approach, Probabilistic Constraint Training (PCT), that utilizes probabilistic logical rules as constraints in the fine-tuning phase without relying on these rules in the inference stage. To assess the effectiveness of PCT, we utilize the related corpora and, additionally, create a new and more challenging benchmark that, unlike the previous ones, uses instance-specific rules. Our study demonstrates that PCT improves the transformer-based language model’s intrinsic reasoning and makes their probabilistic logical reasoning process more explicit and explainable.

poor results in arithmetic reasoning when using transformers (Mishra et al., 2022) which is required for probabilistic logical reasoning. Additionally, logical probabilistic inference requires coherent step-by-step reasoning. However, PLMs’ evaluation of various question-answering (QA) benchmarks shows they produce contradictory results that violate the expected steps of reasoning, such as following transitivity or symmetry rules (Asai and Hajishirzi, 2020). This has led to the development of hybrid approaches, where reasoning tasks are outsourced to Neuro-Symbolic engines, bypassing the need for reasoning by transformers (Zhang et al., 2023). To overcome these limitations, we embed probabilistic reasoning into transformers by imposing the rules of logical probabilistic reasoning as constraints during their training phase.

While incorporating hard logical rules is undoubtedly important and is still being investigated, in the real world, most of the external knowledge and rules involve uncertainty. For example, only a small fraction of the logical rules in DBpedia can be deemed certain (Saeed et al., 2021). Inference over text that includes uncertainty concerning facts, relations, and rules is required in many natural language comprehension tasks.

Without the capability of providing the underlying components and steps necessary to answer a question, a Language Model’s reasoning remains inexplicable even when it accurately answers a question (Clark et al., 2019). In this paper, we propose a method that forces the transformer to follow coherent reasoning steps to answer the final question, as shown in Table 1, yielding a more explainable model

We propose a new approach, Probabilistic Constraint Training (PCT), that explicitly imposes probabilistic reasoning rules during PLM fine-tuning. This approach provides an effective level of abstraction to the models to generalize and transfer reasoning under uncertainty to new domains and to more complex depths of reasoning. 2) We develop a novel evaluation benchmark for probabilistic reasoning over text with context-specific uncertain rules whose probabilities can not be captured from the training data and must be extracted from the text.

Neuro-Symbolic Methods. Central to our approach is the implementation of an end-to-end model, ensuring the transferability of our model to various domains without the need to modify the model’s architecture or decision processes. In contrast, numerous studies in this field rely on a pipeline approach, often incorporating a Neuro- Symbolic engine. Zhang et al. proposes a framework in which Transformers extract the factual knowledge in the text. Consequently, a symbolic engine conducts the reasoning inference.

3.1 Problem Definition We focus on the challenge of performing probabilistic logical reasoning within a QA task where a set of facts F, a set of rules R, and a hypothesis h are provided in a textual context. While these rules, facts, and hypothesis are provided only in their textual form as a part of the input to the task, we have their formal information as a part of the metadata. For example, fact Big(Dave) and the rule Spouse(A,B) & Child(C,B) → Child(C,A) would be given as input in forms: “Dave is big.”, and “If A is a spouse of B and C is a child of B, then C is a child of A.”, respectively. The facts and hypothesis consist of factoids that define properties for an entity “Has_Property(Entity)” or relations between two entities “Relation(Entity1, Entity2)”.

For example, the rule “If A is a cousin of B, then A is a spouse of B.” from RuleBERT will always have the probability of 0.15 in all the examples. However, in Ruletaker-pro, the same rule may hold different probabilities depending on the adverb assigned to it in different instances. A rule such as “Usually, if someone is big, then they are green.” carries a probability of 0.90 in one context, while “Seldom, if someone is big then they are green.” carries a probability of 0.15 in some other context. Given this difference, the model has to extract the rules from each context and can not use the information learned about the rules from the training data.