Faithful and Robust LLM-Driven Theorem Proving for NLI Explanations

Paper · arXiv 2505.24264 · Published May 30, 2025

Natural language explanations play a fundamental role in Natural Language Inference (NLI) by revealing how premises logically entail hypotheses. Recent work has shown that the interaction of large language models (LLMs) with theorem provers (TPs) can help verify and improve the validity of NLI explanations. However, TPs require translating natural language into machine-verifiable formal representations, a process that introduces the risk of semantic information loss and unfaithful interpretation, an issue compounded by LLMs’ challenges in capturing critical logical structures with sufficient precision. Moreover, LLMs are still limited in their capacity for rigorous and robust proof construction within formal verification frameworks. To mitigate issues related to faithfulness and robustness, this paper investigates strategies to (1) alleviate semantic loss during autoformalisation, (2) efficiently identify and correct syntactic errors in logical representations, (3) explicitly use logical expressions to guide LLMs in generating structured proof sketches, and (4) increase LLMs’ capacity of interpreting TP’s feedback for iterative refinement. Our empirical results on e-SNLI, QASC and WorldTree using different LLMs demonstrate that the proposed strategies yield significant improvements in autoformalisation (+18.46%, +34.2%, +39.77%) and explanation refinement (+29.5%, +51.5%, +41.25%) over the state-of-the-art model. Moreover, we show that specific interventions on the hybrid LLM-TP architecture can substantially improve efficiency, drastically reducing the number of iterations required for successful verification.1

However, these integrated neuro-symbolic approaches still face notable challenges. First, automated theorem provers (ATP) require a machineverifiable formal language, yet LLMs often fail to produce precise autoformalisations, underscoring their limited capacity to faithfully convert complex natural language inputs into rigorous formal representations (Wu et al., 2022; Jiang et al., 2024; Quan et al., 2024b). Second, syntactic errors are frequently introduced during the autoformalisation process, leading to reduced theorem-proving success rates when dealing with more complex material inferences (Pan et al., 2023; Olausson et al., 2023; Zhang et al., 2024). Third, when provided with external feedback on complex explanations, LLMs often struggle to combine axioms (explanations) into cohesive proofs and effectively selfcorrect, limiting their effectiveness in more complex NLI settings (Quan et al., 2024a,b).

In general, we implement a neuro-symbolic framework to address the following research questions: RQ1: "To what extent can we deliver faithful autoformalisation that preserves semantic information?"RQ2: "What types of syntactic errors commonly appear in formal representations, and how effectively can state-of-the-art LLMs refine these errors?" RQ3: "Can state-of-the-art LLMs generate structured proof steps that can effectively provide feedback to refine explanations with complex sentences and logical relations?"

To answer these questions, we investigate how to systematically leverage syntactic parsing during autoformalisation to guide LLMs generate logical representation of explanations. In addition, we define the general autoformalisation error types and use LLMs to refine these errors explicitly from the output message of a TP. Furthermore, we propose a method to extract the logical propositions, relations and implications to guide LLMs to generate proof sketches for automated theorem proving and explanation refinement.

the number of refined explanations produced by our framework exceeds that of Explanation- Refiner across all LLMs: raising refinement rates from 41% to 95%, 17% to 90%, and 7% to 73% across all three datasets

we leverage an external theorem prover TP to systematically verify these entailments in an automated manner. Specifically, given the set of input sentences S = pi ∪ {hi} ∪ Ei, we aim to build a set of logical forms ϕ = {Φ(s) | s ∈ S}, where Φ is the autoformalisation process that converts natural language sentences into symbolic representations. From these logical forms, we construct a theory Θ = (A, τ ), where A = {a1, a2, . . . , an} is the set of axioms derived from formalising Ei, and τ is the theorem to be proven, composed of pi and hi. If an automated theorem prover (ATP) can derive a valid proof for Θ, we conclude that Ei is sound and logically valid. Otherwise, we refine Ei by using the failed proof steps as feedback, iteratively generating a refined explanation E′ i that ultimately leads to a valid justification.

3 Methodology

To effectively enhance the joint inference capabilities and robustness between LLMs and theorem provers for explanation-based NLI, we propose a novel framework to enhance three key components: autoformalisation, logical and syntactic error checking and refinement, and LLM-guided proof construction. As illustrated in Figure 1, the pipeline begins with the automated formalisation of natural language into logical representations.

Unlike the previous state-of-the-art approach (i.e., Explanation-Refiner), we begin with a syntactic parsing step that guides LLMs in translating natural language elements into a formal specification compatible with theorem provers. The LLM is prompted to automatically formalise the explanatory sentences into axioms and construct a theorem composed of assumption clauses (drawn from the premise) and a proof goal (derived from the hypothesis). After formalising the input sentences, we apply a quantifier and a logical consistency check along with a refinement process.

Similar to Jiang et al. (2022b) and Quan et al. (2024b), we adopt Isabelle/HOL (Nipkow et al., 2002) to formally verify the constructed theory. Specifically, we invoke the Sledgehammer tool (Paulson and Blanchette, 2012) within Isabelle/ HOL to call upon multiple automated theorem provers (e.g., CVC42, Vampire3), which attempt to prove the theorem derived from the translated NLI tasks. If any prover succeeds, we conclude that the explanation is logically sound, thereby confirming that the premise entails the hypothesis. If no proof is found, we use an LLM to extract logical propositions and relations from the natural language explanations. We then employ an intermediate propositional representation to derive further implications among these propositions, prompting the LLM to generate a step-by-step proof sketch—rather than having the LLM serve directly as a proof planner as in Explanation-Refiner.

we begin by performing syntactic parsing via the LLMs on all provided sentences to extract their grammatical structure. These elements are subsequently mapped onto the agent, event action, and patient roles within a Neo-Davidsonian event semantics framework. For example, consider the sentence "The father and son kicked the ball".

We can parse it as:

NP-SBJ

The father and son

kicked

NP-OBJ

the ball

indicating that "The father and son" is the subject while "the ball" is the object. Thus we could build the Neo-Davidsonian event semantics to formalise it as:

∃xyze . ( Father (x) ∧ Son(y) ∧ Ball (z) ∧ Kicked (e) ∧ Agent (e, x) ∧ Agent (e, y) ∧ Patient (e, z)) By leveraging such a process, we construct a clear representation indicating that the father and the son are the agents performing the event (kick), while the ball is the patient receiving the action, thus capturing all relevant semantic information in the transition from natural language to formal language. We then construct the Isabelle/HOL theory with axioms (explanatory sentences) and the theorem (premise and hypothesis sentences). (* Explanation 1: A man and woman are at the park . *)

axiomatization where explanation_1 : "∃x y z. Man x ∧ Woman y ∧ Park z ∧ At x z ∧ At y z" theorem hypothesis : (* Premise : A man and woman sit on a park bench with a set of newlyweds behind ) assumes asm: "Man x ∧ Woman y ∧ ParkBench z ∧ Newlyweds w ∧ Sit e ∧ Agent e x ∧ Agent e y ∧ Patient e z ∧ Behind w z" ( Hypothesis : People outside ) shows "∃x. People x ∧ Outside x" proof - ( From the premise, we have information about a man and a woman sitting on a park bench . ) from asm have "Man x ∧ Woman y" by blast ( Explanation 1 states that a man and a woman are at the park . ) ( This implies that they are outside, as parks are typically outdoor locations . ) from explanation_1 have "∃x y z. Man x ∧ Woman y ∧ Park z ∧ At x z ∧ At y z" by blast ( Since a man and a woman are at the park, they are outside . *) then have " People x ∧ Outside x" < ATP> then show ? thesis <--ATP /> qed