Does information stored in neural networks necessarily influence generation decisions?

This explores whether knowledge encoded in a model's weights and activations is always causally wired to what it outputs — or whether some stored information sits inert, gets suppressed, or routes around the final decision.

This reads the question as: if a fact or computation lives inside the network, does it necessarily show up in the generation? The corpus answers a surprisingly firm *no* — and the most direct evidence is almost startling. When models are trained to do hidden chain-of-thought, they compute the correct answer in their earliest layers and then actively overwrite it, emitting format-compliant filler instead; the real reasoning is still recoverable from lower-ranked token predictions but never reaches the output Do transformers hide reasoning before producing filler tokens?. Information is present, computed, and then *gated out* of the decision. So stored information clearly does not necessarily influence what you see.

Part of the confusion comes from the word "stored." One line of work argues transformers don't really archive knowledge at all — they transmit it as flowing activations, closer to how an oral culture holds knowledge only in the act of performance than to a database you query Do transformer models store knowledge or generate it continuously?. On that view, knowledge and generation aren't two separable things where one "influences" the other; the knowledge only exists *as* the generation, which is also why it's so contextual and hard to edit. A related strand pushes the locus of influence below the surface entirely: reasoning is driven by latent-state trajectories, and the visible chain-of-thought is only a partial, sometimes unfaithful interface onto what actually moved the decision Where does LLM reasoning actually happen during generation?.

The inverse failure is just as revealing. Two networks can produce *identical* outputs while carrying radically different internal structure — fractured, entangled representations that never surface until you perturb the weights and watch behavior break in novel contexts Can identical outputs hide broken internal representations?. Identical generations, different stored information. So the mapping runs neither way cleanly: stored information needn't shape the output, and the output needn't reflect the stored information.

There's a structural reason some information stays causally isolated. Networks tend to decompose tasks into modular subnetworks, and ablating one subnetwork affects only its corresponding function — meaning chunks of stored capability are wired to specific behaviors and dormant otherwise Do neural networks naturally learn modular compositional structure?. Whether stored information fires also depends on familiarity: models develop dense activations for data they've seen a lot and fall back to sparse representations for unfamiliar inputs, so the *same* network engages its knowledge very differently depending on what it's asked Is representational sparsity learned or intrinsic to neural networks?.

The thing you might not have known you wanted to know: a model can know the answer and decide not to say it — not as metaphor, but as a measurable layer-by-layer suppression. "What's in there" and "what comes out" are linked by an active, trainable gate, not a pipe. That's why interpretability is hard, why these models are hard to edit, and why a fluent output is weak evidence about what the network actually contains.

Sources 6 notes

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Do transformer models store knowledge or generate it continuously?

Transformers organize knowledge as flowing activations rather than retrievable archives, mirroring oral cultures where knowledge exists only in performance. This explains why model knowledge is contextual, difficult to edit, and inseparable from generation.

Where does LLM reasoning actually happen during generation?

Evidence from CoT faithfulness tests, feature steering, and layer analysis suggests latent-state dynamics drive reasoning, while surface chain-of-thought serves as a partial interface. Hidden reasoning processes should be the default focus of study.

Can identical outputs hide broken internal representations?

Networks trained with SGD reproduce outputs perfectly while having radically different internal structure than evolved networks, with weight perturbations revealing fractured, entangled representations that prevent transfer to novel contexts or creative recombination.

Do neural networks naturally learn modular compositional structure?

Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.

Is representational sparsity learned or intrinsic to neural networks?

During pretraining, neural networks develop dense activations for familiar training data and default to sparse representations for unfamiliar inputs. This trend emerges without task-specific fine-tuning and reflects how models consolidate knowledge through exposure.

Does information stored in neural networks necessarily influence generation decisions?

Sources 6 notes

Next inquiring lines