Navigating the Latent Space Dynamics of Neural Models
Neural networks transform high-dimensional data into compact, structured representations, often modeled as elements of a lower dimensional latent space. In this paper, we present an alternative interpretation of neural models as dynamical systems acting on the latent manifold. Specifically, we show that autoencoder models implicitly define a latent vector field on the manifold, derived by iteratively applying the encoding-decoding map, without any additional training. We observe that standard training procedures introduce inductive biases that lead to the emergence of attractor points within this vector field. Drawing on this insight, we propose to leverage the vector field as a representation for the network, providing a novel tool to analyze the properties of the model and the data.
We posit that minimizing the standard autoencoder objective leads to a reduction in the spectral norm of the Jacobian, leading naturally to locally contractive behavior around training examples. This behavior emerges naturally from several explicit and implicit inductive biases present in modern training pipelines: initialization bias (standard schemes preserve activation variance and exhibit a global bias toward contractive mappings); explicit regularization (common methods like weight decay penalize the norm of model parameters, encouraging contraction); and implicit regularization (data augmentations introduce local perturbations around training examples, effectively defining a neighborhood structure that implicitly regularizes the Jacobian).
Memorization and generalization in neural networks exhibit a rich spectrum of behaviors depending on model capacity, regularization, and data availability. In the case of extreme overparameterization, namely networks trained on few data points, it has been shown experimentally and theoretically that AEs can memorize examples and implement associative memory mechanisms. Non gradient-based approaches such as Hopfield networks and their modern variants extend classical attractor dynamics to neural systems that interpolate between memory-based and generalizing regimes. In our work, we show that AEs fall in general in the spectrum between memorization and generalization, depending on inductive biases that enforce contraction.
In this work we proposed to represent neural AEs as vectors fields, implicitly defined by iterating the autoencoding map in the latent space. We showed that (i) attractors in the latent vectors field exists in practice due to inductive biases in the training regime which enforce local contractions; (ii) they retain key properties of the model and the data, linking to memorization and generalization regimes of the model; (iii) knowledge stored in the weights can be retrieved without access to input data in vision foundation models; (iv) paths in the vector field inform on the learned distribution and its shifts. We further validate our approach on vision foundation models, showcasing that attractors of foundation models computed from noise can serve as a dictionary of signals to represent diverse datasets, demonstrating that it is possible to probe the information stored in the weights of foundation models in a black box way, without requiring any input data.
To exploit continuous-space reasoning, we use the last-layer hidden states from the small assistant model as the “soft” thought tokens, rather than the discrete tokens obtained after vocabulary mapping. Staying in the latent space avoids information loss inherent in autoregressive decoding. However, a representational gap between the assistant model and the LLM may hinder effective knowledge transfer. To bridge this gap, we train a projection module to map the soft thought tokens generated by the assistant model to the LLM’s representation space. Training the projection module for each task can be seen as soft prompt tuning for the LLM. The overall Soft thoughts for CoT (SoftCoT) reasoning framework is illustrated in Figure 1.