Latent Collaboration in Multi-Agent Systems

Paper · arXiv 2511.20639 · Published November 25, 2025
Agents Multi ArchitectureCognitive Models LatentLLM ArchitectureMechInterpNovel Architectures

Multi-agent systems (MAS) extend large language models (LLMs) from independent single-model reasoning to coordinative system-level intelligence. While existing LLM agents depend on text-based mediation for reasoning and communication, we take a step forward by enabling models to collaborate directly within the continuous latent space. We introduce LatentMAS, an end-to-end training-free framework that enables pure latent collaboration among LLM agents. In LatentMAS, each agent first performs auto-regressive latent thoughts generation through last-layer hidden embeddings. A shared latent working memory then preserves and transfers each agent’s internal representations, ensuring lossless information exchange. We provide theoretical analyses establishing that LatentMAS attains higher expressiveness and lossless information preservation with substantially lower complexity than vanilla text-based MAS. In addition, empirical evaluations across 9 comprehensive benchmarks spanning math and science reasoning, commonsense understanding, and code generation show that LatentMAS consistently outperforms strong single-model and text-based MAS baselines, achieving up to 14.6% higher accuracy, reducing output token usage by 70.8%-83.7%, and providing 4×-4.3× faster end-to-end inference. These results demonstrate that our new latent collaboration framework enhances system-level reasoning quality while offering substantial efficiency gains without any additional training.

Beyond explicit text, several studies have explored the use of LLMs’ continuous latent space as a new form of “model language,” (Chen et al., 2025b) by either (i) leveraging hidden representations within transformers to enable single model’s internal latent chain-of-thought (CoT) reasoning (Hao et al., 2024; Zhang et al., 2025; Zheng et al., 2025), or (ii) employing KV caches or layer embeddings for information exchange across two models (Fu et al., 2025; Liu et al., 2024). However, a comprehensive model collaboration framework unifying both latent reasoning and latent communication remains unexplored. Moving one step forward, we investigate:

Can multi-agent systems achieve pure latent collaboration?

To address this question, we introduce LatentMAS, an end-to-end collaborative framework that operates entirely within the continuous latent space. Our core design integrates both internal latent thoughts generation and cross-agent latent working memory transfer. Inside each agent, reasoning unfolds through auto-regressive generation of last-layer hidden representations, capturing the model’s ongoing internal thoughts without explicit decoding. Across agents, information is exchanged via shared latent working memory stored in layer-wise KV caches, capturing both the input context and newly generated latent thoughts. Overall, LatentMAS is completely training-free, enabling all agents to think and interact purely through their internal latent representations.

Building on our framework design, LatentMAS is grounded on three foundational principles, verified by comprehensive theoretical and empirical analyses:

• Reasoning Expressiveness: Hidden representations naturally encode models’ continuous

thoughts, allowing each latent step to convey far richer information than discrete tokens.

• Communication Fidelity: Latent working memory preserves input representations and latent

thoughts of each model, enabling lossless cross-agent information transfer.

• Collaboration Complexity: LatentMAS achieves higher collaborative expressiveness of TextMAS while achieving significantly lower inference complexity.

The first two principles jointly underscore the advantage of LatentMAS by enabling richer latent reasoning and lossless latent communication. The third principle further provides an overall complexity analysis, showing that LatentMAS achieves substantially lower computational complexity than textbased MAS while maintaining a higher level of model expressiveness.