Reasoning and Learning Architectures

Do autoencoders learn hidden attractors in latent space?

When you repeatedly apply an autoencoder's encode-decode cycle, do the trajectories in latent space converge to specific points? If so, what creates these attractors and what do they reveal about what the network learned?

Note · 2026-05-18 · sourced from Cognitive Models Latent

A trained autoencoder is usually treated as a one-shot map: input goes in, latent code comes out, reconstruction goes back. Navigating the Latent Space Dynamics of Neural Models reframes the same architecture as a dynamical system. Iterate the encode-decode map and you trace trajectories in latent space. The endpoints of those trajectories are attractor points — locations where iteration stops moving — and they emerge without any additional training, purely from the geometry the autoencoder learned.

The mechanism is locally contractive behavior near training examples. Three inductive biases combine to produce it. Initialization bias: standard schemes preserve activation variance and exhibit a global tendency toward contractive maps. Explicit regularization: weight decay penalizes parameter norms and encourages contraction. Implicit regularization: data augmentation introduces local perturbations around training examples, effectively defining neighborhoods the encoder learns to contract toward. None of these were designed to create attractors; the attractors are a side effect.

What attractors represent depends on training regime. Heavy overparameterization with limited data produces attractors that correspond to memorized examples — the autoencoder behaves like an associative memory akin to a Hopfield network. With more data and less overparameterization, attractors become more abstract — they represent learned distribution modes rather than individual training points. The position on this memorization-vs-generalization spectrum is itself a property of the inductive-bias regime.

This reframing turns the network into an object with intrinsic dynamics that can be analyzed without input data. You can sample noise, iterate the encode-decode map, and discover what the network has actually learned by tracing where the dynamics settle. For foundation models, this enables a class of probing methods that do not require access to the original training data — particularly useful when that data is proprietary or distributed.

Related concepts in this collection

Concept map
13 direct connections · 101 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

autoencoders implicitly define a latent vector field via iterated encode-decode maps with attractors emerging from training-induced contractive bias