Can large language models translate natural language to logic faithfully?
This explores whether LLMs can convert natural language statements into formal logical representations without losing meaning. It matters because faithful translation is essential for any AI system that reasons formally or verifies specifications.
"Faithful Autoformalisation with LLMs" evaluates whether LLMs can translate natural language statements into formal logical representations (first-order logic, modal logic, higher-order logic). The central finding: LLMs can produce syntactically well-formed logical expressions but fail to produce semantically faithful ones. Form and content come apart.
The failure modes are not random — they cluster at precisely the points where natural language semantics are most structurally complex:
- Scope ambiguity resolution: "Every student passed some exam" requires committing to a reading (∀∃ or ∃∀). LLMs tend to produce the surface-order reading rather than the contextually correct one, or produce an unambiguous formula that does not capture the original ambiguity.
- Quantifier precision: Natural language quantifiers ("most", "few", "many") have no standard first-order equivalents. LLMs substitute familiar quantifiers (∀, ∃) that alter the truth conditions.
- Predicate granularity: Natural language predicates compress relational structure. "John gave Mary the book" has multiple relational decompositions in FOL, and LLMs often choose flat single-predicate representations that lose the event structure.
This is a structurally significant finding because autoformalisation is a direct test of the relationship between linguistic competence and logical competence. If LLMs can handle syntax-level structure (as they demonstrably can for many tasks) but fail when that structure requires semantic commitment, then their linguistic processing operates at a level that stops short of truth-conditional content.
The finding connects to the broader pattern that Can models pass tests while missing the actual grammar?. For autoformalisation, "correct output" would mean syntactically valid logic; "genuine linguistic generalisation" would require semantic faithfulness. LLMs achieve the former but not the latter.
This also extends Do language models actually use their encoded knowledge? — even when LLMs encode semantic properties, that encoding does not reliably translate into semantic commitment during generation. The representation has the information; the generation does not use it.
Structured semantics understanding vs. generation asymmetry: "Probing Structured Semantics Understanding and Generation" (2401.05777) confirms the generation < understanding asymmetry for formal languages. LLMs can interpret formal language (translate logical form to natural language question) more accurately than they can generate it (translate natural language to logical form). Furthermore, formal language complexity matters: lower formalization (closer to natural language, like KoPL) is easier for models; higher formalization (SPARQL, Lambda DCS) fails at entity grounding. This suggests the autoformalisation failure is graded — the farther the target representation from natural language surface form, the worse the semantic fidelity.
The practical implication: LLM-assisted formal verification, specification writing, or logical reasoning pipelines cannot trust LLM-generated logical forms without post-hoc semantic verification. The syntactic plausibility of the output masks the semantic errors.
Source: Natural Language Inference; enriched from LLM Architecture
Related concepts in this collection
-
Can models pass tests while missing the actual grammar?
Do language models succeed on grammatical benchmarks by learning surface patterns rather than structural rules? This matters because correct outputs may hide reliance on shallow heuristics that fail on novel structures.
same gap: syntactic correctness without semantic accuracy
-
Do LLMs predict entailment based on what they memorized?
Explores whether language models make entailment decisions by recognizing memorized facts about the hypothesis rather than reasoning through the logical relationship between premise and hypothesis.
both failures trace to LLMs not committing to truth-conditional content
-
Can language models learn meaning from text patterns alone?
Explores whether training on form alone—predicting the next word from prior words—could ever give language models access to communicative intent and genuine semantic understanding.
autoformalisation failure is a concrete manifestation of this structural claim
-
Why do embedding contexts confuse LLM entailment predictions?
Can language models distinguish between contexts that preserve versus cancel entailments? The study explores whether LLMs systematically fail to apply the semantic rules governing presupposition triggers and non-factive verbs.
related: logical connectives and quantifiers show the same opacity as presupposition triggers
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
llms fail at faithful autoformalisation because they cannot translate natural language to logical representations without semantic loss