Agentic and Multi-Agent Systems LLM Reasoning and Architecture

Can agents share thoughts without converting them to text?

Can multi-agent systems exchange information through continuous hidden representations instead of language? This matters because text serialization loses information and slows inference.

Note · 2026-02-23 · sourced from Agents Multi Architecture

Text-based multi-agent systems force rich internal representations through a lossy bottleneck: language. Every inter-agent message requires decoding continuous thoughts into discrete tokens and re-encoding them on the receiving end. LatentMAS eliminates this bottleneck entirely by enabling pure latent collaboration — agents think and communicate in continuous representation space without ever decoding to text.

The framework integrates two mechanisms:

Intra-agent latent reasoning: Each agent generates thoughts as auto-regressive last-layer hidden embeddings — the model's ongoing internal representations without explicit decoding. This preserves the full information content of the model's reasoning at each step.

Cross-agent latent working memory: Information is exchanged via shared layer-wise KV caches that capture both the input context and newly generated latent thoughts. Each agent's internal representations are preserved and made available to other agents without any text serialization.

Three foundational principles are theoretically and empirically verified:

Reasoning expressiveness — hidden representations naturally encode continuous thoughts, allowing each latent step to convey far richer information than discrete tokens.
Communication fidelity — latent working memory preserves input representations and latent thoughts losslessly, enabling perfect cross-agent information transfer.
Collaboration complexity — LatentMAS achieves higher expressiveness than text-based MAS while achieving significantly lower inference complexity.

Empirical results across 9 benchmarks (math, science, commonsense, code): up to 14.6% higher accuracy, 70.8-83.7% token reduction, and 4-4.3× faster end-to-end inference. All without any additional training.

This extends Can agents share thoughts directly without using language? with a critically different mechanism. Thought Communication uses a trained sparse autoencoder to extract shared and private latent thoughts with theoretical identifiability guarantees. LatentMAS is entirely training-free, using raw hidden embeddings and KV-cache transfer. The approaches are complementary: Thought Communication for explicit, controlled sharing with theoretical guarantees; LatentMAS for efficient, training-free implicit sharing with better practical performance.

Source: Agents Multi Architecture

Related concepts in this collection

Can agents share thoughts directly without using language? Explores whether multi-agent systems can communicate by exchanging latent thoughts extracted from hidden states, bypassing the ambiguity and misalignment problems inherent in natural language.
Thought Communication: trained autoencoder approach with identifiability guarantees; LatentMAS is the training-free alternative with practical efficiency gains
Can multiple LLMs coordinate without explicit collaboration rules? When multiple language models share a concurrent key-value cache, do they spontaneously develop coordination strategies? This matters because it could reveal how reasoning models naturally collaborate and inform more efficient parallel inference.
Hogwild! Inference: shared KV cache for emergent coordination; LatentMAS formalizes the KV-cache sharing into a collaboration framework
Can we explore multiple reasoning paths without committing to one token? Standard language models pick one token at each step, collapsing uncertainty and forcing single reasoning trajectories. Could preserving the full probability distribution across token embeddings enable implicit parallel exploration instead?
Soft Thinking: training-free intra-model latent reasoning; LatentMAS extends this to inter-model latent collaboration
Can models reason without generating visible thinking tokens? Explores whether intermediate reasoning must be verbalized as text tokens, or if models can think in hidden continuous space. Challenges a foundational assumption about how language models scale their reasoning capabilities.
depth-recurrent latent reasoning; LatentMAS applies latent reasoning to multi-agent collaboration rather than single-model depth

Concept map

14 direct connections · 127 in 2-hop network ·dense cluster

Can agents share thoughts without converting the… Can agents share thoughts directly without using l… Can multiple LLMs coordinate without explicit coll… Can we explore multiple reasoning paths without co… Can models reason without generating visible think…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

latent multi-agent collaboration achieves training-free lossless information exchange through shared KV-cache working memory — reducing tokens by 70-84 percent while improving accuracy