Neutralizing Bias in LLM Reasoning using Entailment Graphs
However, recent works show that LLMs still suffer from hallucinations in NLI due to attestation bias, where LLMs overly rely on propositional memory to build shortcuts. To solve the issue, we design an unsupervised framework to construct counterfactual reasoning data and fine-tune LLMs to reduce attestation bias. To measure bias reduction, we build bias-adversarial variants of NLI datasets with randomly replaced predicates in premises while keeping hypotheses unchanged. Extensive evaluations show that our framework can significantly reduce hallucinations from attestation bias
In this study, we explore the method of finetuning LLMs to improve their robustness against the attestation bias. We propose an unsupervised approach to construct counterfactual but logically consistent datasets for training LLMs. Our approach begin with unsupervised extraction of textual entailment relations between predicates from large-scale open-domain corpora using semantic parsing. The extracted data is then formatted into Entailment Graphs (EGs) (Hosseini et al., 2018, 2021), which consist of typed predicate pairs. Finally, we generate counterfactual samples by randomly selecting named entities and other arguments to instantiate these types.
We evaluate the effectiveness of our method along two dimensions: bias reduction and general inferential performance improvement. First, to measure how well our training has reduced the attestation bias, we compare LLMs before and after our training on bias-adversarial variants of NLI datasets. Specifically, we randomly alter the predicates in the premises while keeping the hypothesis fixed. The newly generated premises are non-entailing, so any positive judgments by the LLM are false positives, arising from attestation bias relating to the hypothesis. The results demonstrate that training LLMs with our method can significantly reduce attestation bias, ensuring a more reliable evaluation of their reasoning capabilities.
Attestation Bias: The attestation bias occurs when LLMs show a significantly higher probability of predicting Entail when the hypothesis is attested, indicating that the inference process of LLMs is heavily influenced by their reliance on memorization about hypotheses. As a result, LLMs are inherently prone to disregarding the premise and responding incorrectly by relying on memorized information about hypothesis from their training corpus, as illustrated in Figure 1.
Mckenna et al. (2023) conducted a hypothesis-only test on LLMs, revealing that when labels contradict attestation bias, LLMs can be poor or even near-random classifier.
Entailment Graphs: EGs are symbolic graphs used to preserve entailment relations between predicates (Berant et al., 2010, 2011; Hosseini et al., 2018, 2021). Unlike sentence-level inference data, EGs are formatted as sets of triples, with each triple consisting of predicate pairs and typed arguments. For example, “(Person.X, visited, Location.Y) |= (Person.X, went to, Location.Y)”. EGs have been utilized in open-domain question answering and knowledge inference (Cheng et al., 2023; Wang et al., 2024).