Language Understanding and Reasoning Reasoning and Learning Architectures Reasoning and Knowledge

Do language models use the hierarchical geometry they inherit?

Word2vec and Gemma share the same hierarchical spectral signature despite vastly different architectures and purposes. This suggests shared statistical origins, but leaves open whether the LLM actually recruits this structure for reasoning or simply inherits unused geometry.

Note · 2026-05-28 · sourced from MechInterp

The decisive move in the co-occurrence account of concept geometry is a cross-architecture comparison. The hierarchical splitting geometry is first derived and confirmed for word2vec embeddings across many WordNet subtrees. Then the same coarse-to-fine spectral signature is shown to extend "strikingly well" to Gemma 2B unembeddings. Two systems with entirely different objectives and training regimes — a shallow predict-context embedding and a large autoregressive transformer's output matrix — carry the same hierarchical fingerprint. If the structure were a functional artifact of how an LLM reasons, it should not appear, in the same form, in a model that does not reason at all.

This is the strongest available argument that the geometry is statistical, not functional: a shared signature across architectures points to a shared cause upstream of both — the co-occurrence statistics of the training text — rather than convergent functional design. Each word is characterized by discrete, continuous, and hierarchical attributes; words with similar attributes co-occur more often; and that alone gives rise to the geometric organization. Both models inherit it because both are, in different ways, fitting the same pairwise statistics.

Why it leaves a question open: the authors are explicit that such organization may be useful for function but is not driven by it — which leaves unresolved whether and where the LLM actually uses the hierarchical geometry it inherits. Shared structure proves common statistical origin; it does not prove the structure is inert in the transformer. Disentangling inherited-but-unused geometry from inherited-and-recruited geometry is the open problem this result sharpens rather than settles.


— "Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence", https://arxiv.org/abs/2605.23821

Related concepts in this collection

Concept map
12 direct connections · 92 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

word2vec and gemma unembeddings share the same hierarchical signature so structure is statistical not functional