Can multiple LLMs coordinate without explicit collaboration rules?
When multiple language models share a concurrent key-value cache, do they spontaneously develop coordination strategies? This matters because it could reveal how reasoning models naturally collaborate and inform more efficient parallel inference.
Existing approaches to parallel LLM inference impose a fixed collaboration strategy: independent sampling with voting, explicit subtask decomposition, or cross-referencing between agents. Each strategy has failure modes — voting wastes compute on stragglers, subtask splitting can't re-plan when the original decomposition is wrong, and cross-referencing requires turn-based exchange that limits interaction speed.
Hogwild! Inference takes a different approach: run multiple LLM instances with the same weights and a shared KV cache. Each worker generates tokens in parallel, and all workers can attend to each other's tokens immediately as they're generated — "instant" cross-attention through a concurrent cache with RoPE-adjusted positional embeddings. No collaboration framework is specified; workers are simply prompted to decide their course of action given what others are doing.
The surprising finding: existing reasoning-capable models (QwQ, DeepSeek-R1) can "reason to coordinate" out of the box, without any fine-tuning for multi-agent collaboration. Workers formulate and follow plans, adapt when plans fail, point out each other's errors, use each other's key observations, and — when prompted to check — can often detect when they're doing redundant work and change strategy.
This is a third mode of parallel inference, distinct from both independent sampling (no interaction) and structured multi-agent debate (turn-based interaction). Shared-memory parallelism enables continuous, real-time coordination rather than discrete message-passing. The human collaboration analogy is apt: humans working together dynamically re-plan, abandon approaches, and build on each other's partial progress — behaviors that fixed strategies cannot accommodate.
The limitation is "often but not always" — workers don't always detect redundancy or coordinate optimally. But the baseline capability exists without training, suggesting that reasoning-capable models already possess the coordination skills needed for shared-memory collaboration.
Source: Inference time scaling
Related concepts in this collection
-
Why does parallel reasoning outperform single chain thinking?
Does dividing a fixed token budget across multiple independent reasoning paths beat spending it all on one long chain? This explores how breadth and diversity in reasoning compare to depth.
extends: shared-KV-cache parallelism is a third mode beyond independent sampling and sequential extension; enables coordination, not just diversity
-
Does a model improve by arguing with itself?
When models revise their own reasoning in response to self-generated criticism, do they converge on better answers or worse ones? And how does that compare to challenge from other models?
Hogwild! enables real-time multi-instance interaction through shared memory rather than turn-based exchange
-
Why do multi-agent LLM systems converge without real debate?
When multiple AI agents reason together, do they genuinely deliberate or just accommodate each other's views? Research into clinical reasoning systems reveals how often agents reach agreement without substantive disagreement.
workers can detect redundancy and pivot, potentially addressing premature convergence through continuous visibility into each other's reasoning
-
Can extreme task decomposition enable reliable execution at million-step scale?
Can breaking tasks into maximally atomic subtasks with voting-based error correction solve the fundamental reliability problem in long-horizon tasks? This challenges whether better models or better decomposition is the path to high-reliability AI systems.
contrasts: MAKER uses fixed decomposition with voting; Hogwild! uses emergent coordination without predefined decomposition
-
When does debate actually improve reasoning accuracy?
Multi-agent debate shows promise for reasoning tasks, but under what conditions does it help versus hurt? The research explores whether debate amplifies errors when evidence verification is missing.
Hogwild! shared-KV-cache coordination sidesteps the turn-based debate structure that enables persuasion-over-truth: continuous real-time visibility into all workers' reasoning may prevent the rhetorical framing that debate without evidence verification produces
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
parallel LLM workers sharing a concurrent KV cache can emergently coordinate without predefined collaboration framework