INQUIRING LINE

Does Gemma's transformer explicitly exploit the inherited hierarchical geometry?

This explores whether Gemma actually *uses* the nested, taxonomy-like geometry found in its representations as a working mechanism — or whether that structure is just an inherited fingerprint of the text it was trained on that the model never deliberately puts to work.


This explores whether Gemma's transformer functionally exploits its hierarchical geometry, or merely inherits it. The corpus leans hard toward the second answer: the hierarchy is a residue of training text, not a designed feature the model reaches for. The cleanest evidence is that Gemma 2B's unembeddings and word2vec embeddings share an *identical* coarse-to-fine spectral signature across WordNet taxonomies Do language models use the hierarchical geometry they inherit?. Those two models have completely different objectives and architectures, so the shared nested structure can't come from anything either one does functionally — it has to originate in the co-occurrence statistics of language itself. A companion result makes the mechanism explicit: spectral analysis of raw word co-occurrence matrices predicts and reproduces the same geometry, meaning no hierarchy-specific circuitry is required for it to appear Where does hierarchical structure in language models come from?.

So "inherited" is well supported. "Explicitly exploited" is where the corpus gets interesting, because the presence of structure and the *use* of structure are not the same thing. One striking finding is that a model can carry all the linearly decodable features a task needs while its internal organization is actually fractured and brittle — perfect accuracy masking representations that fall apart under perturbation Can models be smart without organized internal structure?. That's a direct warning against assuming geometry-you-can-read is geometry-the-model-uses. Inherited structure can sit in the weights as a statistical shadow without being load-bearing for computation.

There's a counterweight worth knowing about, though. Other work shows transformers do sometimes *recruit* geometric structure as a live mechanism. LLMs encode syntactic type and direction in a polar coordinate system — angle and distance both carry information, and using both nearly doubles probing accuracy over distance alone How do language models encode syntactic relations geometrically?. And multi-hop reasoning success correlates with entity representations clustering by cosine similarity, a geometric signature that tracks an actual capability rather than just a statistical leftover How do transformers learn to reason across multiple steps?. So geometry *can* be exploited — the question is whether the specific WordNet-style hierarchy is.

The honest synthesis: the hierarchical taxonomy geometry in Gemma looks far more like inheritance than exploitation. It emerges as a mathematical consequence of corpus statistics, shows up identically in models that share nothing functionally, and the field has explicit cautions that decodable structure ≠ used structure. This reframes a tempting assumption — that because we can find clean ontological trees inside an LLM, the model must be reasoning over them. More likely the trees are sediment left by language, and whatever reasoning happens flows through the residual stream as continuous activation rather than lookups against a stored hierarchy Do transformer models store knowledge or generate it continuously?. The thing you didn't know you wanted to know: a model can be shaped *by* a structure it never actually consults.


Sources 6 notes

Do language models use the hierarchical geometry they inherit?

Word2vec embeddings and Gemma 2B unembeddings share identical coarse-to-fine spectral signatures across WordNet taxonomies. Since these models have entirely different objectives, the shared structure must originate from training text statistics rather than convergent functional needs.

Where does hierarchical structure in language models come from?

LLM hierarchical representations arise as a direct mathematical consequence of corpus statistics, not from hierarchy-specific mechanisms. Spectral analysis of word co-occurrence matrices predicts and reproduces the same nested geometry found in trained embeddings and word2vec models.

Can models be smart without organized internal structure?

Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.

How do language models encode syntactic relations geometrically?

The Polar Probe shows LLMs represent syntactic type and direction through both distance and angular position between embeddings, nearly doubling accuracy over distance-only methods. This demonstrates neural networks spontaneously learn structured, symbolic-compatible geometry.

How do transformers learn to reason across multiple steps?

Controlled training reveals transformers learn multi-hop reasoning in three phases: memorization, in-distribution generalization, and cross-distribution reasoning. Successful reasoning correlates with cosine clustering of entity representations, and second-hop generalization requires explicit compositional exposure during training.

Do transformer models store knowledge or generate it continuously?

Transformers organize knowledge as flowing activations rather than retrievable archives, mirroring oral cultures where knowledge exists only in performance. This explains why model knowledge is contextual, difficult to edit, and inseparable from generation.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst. Question: Does Gemma's transformer *functionally* exploit its inherited hierarchical geometry, or merely carry it as statistical sediment from training text?

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–2026. Core claims:
- Gemma 2B's unembeddings and word2vec embeddings share identical coarse-to-fine spectral signatures across WordNet taxonomies, yet have completely different architectures and objectives — pointing to corpus statistics, not designed circuitry (2605.23821, ~2026).
- Spectral analysis of raw word co-occurrence matrices alone reproduces the same hierarchical geometry, suggesting no hierarchy-specific mechanism is required (2605.23821, ~2026).
- Perfect task accuracy can mask internally fragmented representations that collapse under perturbation — a direct caution that decodable structure ≠ used structure (2405.08366, ~2024).
- Transformers do recruit geometric structure as live mechanisms in some contexts: polar coordinate systems encode syntax (type + direction), doubling probing accuracy over distance alone (2412.05571, ~2024); entity clustering by cosine similarity correlates with multi-hop reasoning success (2505.23653, ~2025).
- Residual streams transmit knowledge as continuous flow, not stored lookups (inferred from 2405.00208, ~2024).

Anchor papers (verify; mind their dates):
- arXiv:2605.23821 Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence (2026)
- arXiv:2412.05571 A polar coordinate system represents syntax in large language models (2024)
- arXiv:2505.23653 How do Transformers Learn Implicit Reasoning? (2025)
- arXiv:2405.08366 Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control (2024)

Your task:
(1) RE-TEST EACH CONSTRAINT. For each finding above: has model scaling, mechanistic-interpretability tooling (SAE maturity, activation steering), or causal intervention methods since 2024 changed the evidence? Specifically: can we now FORCE Gemma to *use* inherited geometry via in-context prompting or soft prompting, or does intervention show the hierarchy plays no causal role? Separate the durable question (does the model consult the hierarchy?) from perishable claims (detection methods are too weak to find it).
(2) Surface the strongest work from the last 6 months that CONTRADICTS the "inheritance not exploitation" thesis — e.g., any paper showing geometry *causally* drives outputs, or that training objectives *intentionally* build hierarchies.
(3) Propose 2 research questions assuming the regime may have shifted: (a) If newer interpretability methods (e.g., causal scrubbing on SAEs, or multi-agent decomposition) can isolate hierarchy-specific subcomputations, what would that tell us about specialization in the residual stream? (b) Does the hierarchy appear *exploited* only in models trained on structured data (code, knowledge graphs), and remain sediment in language-only models?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines