Agentic and Multi-Agent Systems LLM Reasoning and Architecture

Can agents share thoughts without converting them to text?

Can multi-agent systems exchange information through continuous hidden representations instead of language? This matters because text serialization loses information and slows inference.

Note · 2026-02-23 · sourced from Agents Multi Architecture

Text-based multi-agent systems force rich internal representations through a lossy bottleneck: language. Every inter-agent message requires decoding continuous thoughts into discrete tokens and re-encoding them on the receiving end. LatentMAS eliminates this bottleneck entirely by enabling pure latent collaboration — agents think and communicate in continuous representation space without ever decoding to text.

The framework integrates two mechanisms:

Intra-agent latent reasoning: Each agent generates thoughts as auto-regressive last-layer hidden embeddings — the model's ongoing internal representations without explicit decoding. This preserves the full information content of the model's reasoning at each step.

Cross-agent latent working memory: Information is exchanged via shared layer-wise KV caches that capture both the input context and newly generated latent thoughts. Each agent's internal representations are preserved and made available to other agents without any text serialization.

Three foundational principles are theoretically and empirically verified:

  1. Reasoning expressiveness — hidden representations naturally encode continuous thoughts, allowing each latent step to convey far richer information than discrete tokens.
  2. Communication fidelity — latent working memory preserves input representations and latent thoughts losslessly, enabling perfect cross-agent information transfer.
  3. Collaboration complexity — LatentMAS achieves higher expressiveness than text-based MAS while achieving significantly lower inference complexity.

Empirical results across 9 benchmarks (math, science, commonsense, code): up to 14.6% higher accuracy, 70.8-83.7% token reduction, and 4-4.3× faster end-to-end inference. All without any additional training.

This extends Can agents share thoughts directly without using language? with a critically different mechanism. Thought Communication uses a trained sparse autoencoder to extract shared and private latent thoughts with theoretical identifiability guarantees. LatentMAS is entirely training-free, using raw hidden embeddings and KV-cache transfer. The approaches are complementary: Thought Communication for explicit, controlled sharing with theoretical guarantees; LatentMAS for efficient, training-free implicit sharing with better practical performance.


Source: Agents Multi Architecture

Related concepts in this collection

Concept map
14 direct connections · 127 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

latent multi-agent collaboration achieves training-free lossless information exchange through shared KV-cache working memory — reducing tokens by 70-84 percent while improving accuracy