LLM Reasoning and Architecture Language Understanding and Pragmatics

Why can't language models reverse learned facts?

Language models trained on directional statements like "A is B" often fail to answer the reverse query. This explores why symmetric relations aren't automatically learned during training, despite appearing throughout the data.

Note · 2026-02-23 · sourced from Flaws
What do language models actually know?

If a model is trained on "Valentina Tereshkova was the first woman to travel to space," it will not automatically answer "Who was the first woman to travel to space?" Moreover, the likelihood of the correct answer is not higher than for a random name. The training encodes A→B but not B→A.

This is not a failure of logical deduction. GPT-4 given "A is B" in context can infer "B is A" perfectly well. The failure is in meta-learning during training — the model does not extract the general principle that identity is symmetric from the training data, even though the training data is full of examples where both directions occur.

The practical implications are significant. Knowledge retrieval from LLMs is directional — the model's ability to recall a fact depends on the query direction matching the training data format. This means coverage of world knowledge is systematically incomplete in a non-obvious way: the model may "know" a fact by one measure (can state A→B) but not by another (cannot retrieve A given B).

This connects to Does training data format shape reasoning strategy more than domain? — the format of how information was presented during training determines what retrieval patterns are available. The reversal curse is a specific instance: the sequential format of autoregressive training creates directional associations that don't generalize to their logical inverses.

The reversal curse also challenges the assumption that LLMs develop internal representations that abstract away from surface form. If a symmetric relation were truly represented internally, both directions would be accessible. The directional failure suggests the representation is closer to associative pattern than relational structure.


Source: Flaws

Related concepts in this collection

Concept map
14 direct connections · 147 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

the reversal curse — LLMs trained on A is B fail to learn B is A