Agentic and Multi-Agent Systems LLM Reasoning and Architecture

Can multiple LLMs coordinate without explicit collaboration rules?

When multiple language models share a concurrent key-value cache, do they spontaneously develop coordination strategies? This matters because it could reveal how reasoning models naturally collaborate and inform more efficient parallel inference.

Note · 2026-02-23 · sourced from Inference time scaling
How should we allocate compute budget at inference time? How should researchers navigate LLM reasoning research?

Existing approaches to parallel LLM inference impose a fixed collaboration strategy: independent sampling with voting, explicit subtask decomposition, or cross-referencing between agents. Each strategy has failure modes — voting wastes compute on stragglers, subtask splitting can't re-plan when the original decomposition is wrong, and cross-referencing requires turn-based exchange that limits interaction speed.

Hogwild! Inference takes a different approach: run multiple LLM instances with the same weights and a shared KV cache. Each worker generates tokens in parallel, and all workers can attend to each other's tokens immediately as they're generated — "instant" cross-attention through a concurrent cache with RoPE-adjusted positional embeddings. No collaboration framework is specified; workers are simply prompted to decide their course of action given what others are doing.

The surprising finding: existing reasoning-capable models (QwQ, DeepSeek-R1) can "reason to coordinate" out of the box, without any fine-tuning for multi-agent collaboration. Workers formulate and follow plans, adapt when plans fail, point out each other's errors, use each other's key observations, and — when prompted to check — can often detect when they're doing redundant work and change strategy.

This is a third mode of parallel inference, distinct from both independent sampling (no interaction) and structured multi-agent debate (turn-based interaction). Shared-memory parallelism enables continuous, real-time coordination rather than discrete message-passing. The human collaboration analogy is apt: humans working together dynamically re-plan, abandon approaches, and build on each other's partial progress — behaviors that fixed strategies cannot accommodate.

The limitation is "often but not always" — workers don't always detect redundancy or coordinate optimally. But the baseline capability exists without training, suggesting that reasoning-capable models already possess the coordination skills needed for shared-memory collaboration.


Source: Inference time scaling

Related concepts in this collection

Concept map
15 direct connections · 143 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

parallel LLM workers sharing a concurrent KV cache can emergently coordinate without predefined collaboration framework