Can fractured representations explain why models fail at systematic generalization?

This explores whether 'fractured representations' — networks that produce correct outputs while their internal structure is broken or entangled — are the real reason models stumble when asked to recombine what they know into novel situations.

This explores whether 'fractured representations' — the idea that a model can get answers right while its internal wiring is tangled and brittle — explain systematic generalization failures. The corpus suggests it's a strong candidate, but it's one of several overlapping explanations, and reading them together is more illuminating than any one alone.

The core claim comes from work showing that two networks with *identical* outputs can have radically different internals: networks trained with ordinary gradient descent develop fractured, entangled representations that look fine on the test set but shatter under small weight perturbations and refuse to transfer to new contexts or recombine creatively Can identical outputs hide broken internal representations?. That's a direct mechanism for generalization failure — if the parts aren't cleanly separable, you can't reassemble them into something new. The deeper theoretical version is the *binding problem*: networks struggle to segregate distinct entities from input, keep their representations separate, and reuse learned structure in novel combinations — which is offered explicitly as *the* explanation for why neural nets fail at compositional generalization Why do neural networks fail at compositional generalization?.

But here's where the corpus complicates the story. A competing line of evidence says transformers don't fail because their representations are fractured — they fail because they were never doing systematic reasoning in the first place. They succeed in-distribution by memorizing and matching computation subgraphs from training, then collapse on novel compositions Do transformers actually learn systematic compositional reasoning?. A related finding reframes the failure boundary itself: models don't break at some complexity threshold, they break at *instance novelty* — any reasoning chain works if the model saw similar instances, because it's fitting instance-level patterns rather than learning a generalizable algorithm Do language models fail at reasoning due to complexity or novelty?. So 'fractured representation' and 'pattern-matching instead of reasoning' may be two descriptions of the same underlying gap, seen from the inside (broken structure) versus the outside (novelty-bounded behavior).

The most interesting twist is that fracturing isn't inevitable — and might even be partly fixable. Pruning experiments show networks *do* sometimes decompose compositional tasks into clean, isolated subnetworks, and that pretraining substantially increases how reliable and modular that structure is Do neural networks naturally learn modular compositional structure?. Scaling can partly overcome the binding problem by letting compositional representations emerge Why do neural networks fail at compositional generalization?. And the way a model *encodes* unfamiliarity matters too: under out-of-distribution shift, models sparsify their activations in a localized way that acts as a stabilizing filter rather than a breakdown Do language models sparsify their activations under difficult tasks?, and this density-vs-sparsity pattern is itself *learned* through how familiar the training data was Is representational sparsity learned or intrinsic to neural networks?. That suggests fracturing is a property of the training regime, not a fixed law of architecture.

The thing you might not have known you wanted to know: whether representations end up fractured or modular seems to be decided largely by *exposure and pretraining* rather than by the architecture alone — which means systematic generalization failure may be less an unfixable flaw of transformers and more a symptom of how, and on what, we train them.

Sources 7 notes

Can identical outputs hide broken internal representations?

Networks trained with SGD reproduce outputs perfectly while having radically different internal structure than evolved networks, with weight perturbations revealing fractured, entangled representations that prevent transfer to novel contexts or creative recombination.

Why do neural networks fail at compositional generalization?

Greff et al. argue that neural networks cannot dynamically bind distributed information into compositional structures due to three failures: segregating entities from inputs, maintaining representational separation, and reusing learned structure in novel combinations. Scaling can partially overcome this by enabling compositional representations to emerge.

Do transformers actually learn systematic compositional reasoning?

Research shows transformers succeed on in-distribution tasks by memorizing computation subgraphs from training data, not by learning systematic rules. They fail drastically on novel compositions, with errors compounding across reasoning steps.

Do language models fail at reasoning due to complexity or novelty?

LRMs don't break at complexity thresholds but at instance-novelty boundaries. Models fit instance-based patterns rather than generalizable algorithms, so any reasoning chain succeeds if trained on similar instances, regardless of length.

Do neural networks naturally learn modular compositional structure?

Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.

Do language models sparsify their activations under difficult tasks?

As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.

Is representational sparsity learned or intrinsic to neural networks?

During pretraining, neural networks develop dense activations for familiar training data and default to sparse representations for unfamiliar inputs. This trend emerges without task-specific fine-tuning and reflects how models consolidate knowledge through exposure.

Can fractured representations explain why models fail at systematic generalization?

Sources 7 notes

Next inquiring lines