Can latent thought vectors scale language models beyond parameters?
Explores whether explicit latent thought vectors with dual-rate learning create new scaling dimensions independent of model size. This matters because it suggests alternatives to simply building larger models.
Latent-Thought Language Models (LTMs) propose a different scaling strategy than larger parameters or longer contexts: explicit latent thought vectors that follow a prior model in latent space and guide autoregressive token generation. This creates additional scaling dimensions — higher sample efficiency by increasing training compute per token, with further gains by trading model size for more inference steps.
Architecture. Latent thought vectors represent an abstract representation of the entire sequence, controlling the decoder's generation of each token. Training uses variational Bayes with a dual-rate process: fast learning of local variational parameters for the posterior distribution of latent vectors (adapting quickly to specific inputs) coupled with slow learning of global decoder parameters (gradually accumulating general knowledge).
Cognitive inspiration. The dual-rate scheme parallels established cognitive models:
- Declarative-procedural model (Ullman 2004): latent vectors and local parameters parallel declarative/episodic memory; global decoder parameters parallel procedural memory
- Fast-slow learning (Kumaran et al. 2016): fast episodic learning and slow schematic learning interplay
- Language of thought (Fodor 1975): latent thought vectors as "words" of an internal thought language
Scaling properties. LTMs demonstrate superior sample and parameter efficiency compared to conventional autoregressive models and discrete diffusion models. They significantly outperform on validation perplexity and zero-shot language modeling. Emergent few-shot in-context reasoning capabilities scale with both model size and latent size — providing two independent scaling dimensions.
The connection to existing latent reasoning approaches is important but distinct. Can models reason without generating visible thinking tokens? describes depth-recurrent architectures that iterate in latent space at inference time. LTMs use latent vectors differently — as sequence-level abstractions that guide token generation rather than per-token iterative computation. The dual-rate learning provides a training-time mechanism that depth-recurrence does not.
The Titans parallel is also notable: Can neural memory modules scale language models beyond attention limits? separates fast attention (short-term) from slow memory (long-term). LTMs separate fast local adaptation from slow global learning. Both architectures implement the fast-slow cognitive distinction but at different levels — Titans for memory, LTMs for generation.
Source: Cognitive Models Latent
Related concepts in this collection
-
Can models reason without generating visible thinking tokens?
Explores whether intermediate reasoning must be verbalized as text tokens, or if models can think in hidden continuous space. Challenges a foundational assumption about how language models scale their reasoning capabilities.
different latent approach: per-token iterative computation vs sequence-level latent vectors
-
Can neural memory modules scale language models beyond attention limits?
Can separating short-term attention from adaptive long-term memory allow models to efficiently handle context windows exceeding 2M tokens while maintaining competitive performance?
Titans implements fast-slow at memory level; LTMs implement it at generation level
-
Can inference compute replace scaling up model size?
Explores whether smaller models given more thinking time during inference can match larger models. Matters because it reshapes deployment economics and compute allocation strategies.
LTMs demonstrate this: model size can be traded for inference steps
-
Can computational power accelerate scientific discovery itself?
Does the pace of research breakthroughs scale with computing resources, like model performance does? ASI-ARCH tested this by running thousands of autonomous experiments to discover neural architectures.
LTMs provide new dimensions for architecture search
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
latent-thought language models introduce additional scaling dimensions beyond parameters by incorporating explicit latent thought vectors with dual-rate learning