LLM Reasoning and Architecture Language Understanding and Pragmatics

Why can't language models reverse learned facts?

Language models trained on directional statements like "A is B" often fail to answer the reverse query. This explores why symmetric relations aren't automatically learned during training, despite appearing throughout the data.

Note · 2026-02-23 · sourced from Flaws

If a model is trained on "Valentina Tereshkova was the first woman to travel to space," it will not automatically answer "Who was the first woman to travel to space?" Moreover, the likelihood of the correct answer is not higher than for a random name. The training encodes A→B but not B→A.

This is not a failure of logical deduction. GPT-4 given "A is B" in context can infer "B is A" perfectly well. The failure is in meta-learning during training — the model does not extract the general principle that identity is symmetric from the training data, even though the training data is full of examples where both directions occur.

The practical implications are significant. Knowledge retrieval from LLMs is directional — the model's ability to recall a fact depends on the query direction matching the training data format. This means coverage of world knowledge is systematically incomplete in a non-obvious way: the model may "know" a fact by one measure (can state A→B) but not by another (cannot retrieve A given B).

This connects to Does training data format shape reasoning strategy more than domain? — the format of how information was presented during training determines what retrieval patterns are available. The reversal curse is a specific instance: the sequential format of autoregressive training creates directional associations that don't generalize to their logical inverses.

The reversal curse also challenges the assumption that LLMs develop internal representations that abstract away from surface form. If a symmetric relation were truly represented internally, both directions would be accessible. The directional failure suggests the representation is closer to associative pattern than relational structure.

Source: Flaws

Related concepts in this collection

Does training data format shape reasoning strategy more than domain? What explains why models trained on multiple-choice data reason differently than those trained on free-form text? The research isolates format and domain effects to measure which one matters more.
training format determines retrieval patterns; the reversal curse is a specific directional failure of format-bound learning
Why do LLMs handle causal reasoning better than temporal reasoning? Exploring whether language models perform asymmetrically on different discourse relations and what training data patterns might explain the gap between causal and temporal reasoning abilities.
another case where training data distribution shapes which reasoning directions succeed
Do large language models reason symbolically or semantically? Can LLMs follow explicit logical rules when those rules contradict their training knowledge? Testing whether reasoning operates independently of semantic associations reveals what computational mechanisms actually drive LLM multi-step inference.
the reversal curse is consistent: symbolic reasoning (symmetry of identity) is not learned; only the semantic association in one direction

Concept map

14 direct connections · 147 in 2-hop network ·dense cluster

Why can't language models reverse learned facts? Does training data format shape reasoning strategy… Why do LLMs handle causal reasoning better than te… Do large language models reason symbolically or se…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

the reversal curse — LLMs trained on A is B fail to learn B is A