INQUIRING LINE

How do capability vectors enable discovery in multi-agent systems?

This explores how representing each agent's skills as a searchable vector turns 'finding the right agent for a job' into a semantic lookup rather than hand-wired routing — and what that buys (and costs) in a system of many heterogeneous agents.


This explores how representing each agent's skills as a searchable vector turns 'finding the right agent for a job' into a semantic lookup rather than hand-wired routing. The core idea in the corpus is that capability vectors make *discovery a first-class operation*: instead of an engineer manually wiring which agent calls which, you embed each agent's capabilities into a vector index (HNSW) and match incoming needs by semantic similarity, while folding in policy and budget constraints so the match respects more than just 'who's closest in meaning' Can semantic capability vectors replace manual agent routing?. The payoff is that this scales sub-linearly as agents get more numerous and more specialized — exactly the regime where hand-maintained routing tables collapse under their own combinatorics.

What makes this interesting is *why* discovery is the bottleneck worth solving. Multi-agent coordination degrades predictably as the network grows — agents agree too late, or adopt strategies without telling their neighbors, and they tend to accept information without verifying it, which lets errors propagate Why do multi-agent systems fail to coordinate at scale?. Capability vectors attack the front end of that problem: if the system can reliably find the *right* agent semantically, it shrinks the coordination surface where these failures compound. But discovery alone doesn't fix coordination — it just routes the work better.

Discovery also pairs naturally with its mirror image: pruning. Where capability vectors decide who to bring *in*, contribution-scoring methods like DyLAN decide who to push *out*, deactivating low-performing agents at inference time without task-specific tuning Can multi-agent teams automatically remove their weakest members?. Read together, these are two halves of dynamic team composition — semantic match to assemble, contribution score to trim. And the 'versioned' part of versioned capability vectors connects to how capabilities themselves *change*: systems like SkillClaw aggregate interaction trajectories across users and evolve shared skills, which means the vectors being searched aren't static — they're the index over a living, improving skill set How can agent systems share learned skills across users?.

Here's the part a curious reader might not expect: discovery is necessary but nowhere near sufficient. Capability matching presumes capability is the thing that decides success — and the corpus pushes back hard. Capable agents stall in real deployments not from capability gaps but from missing ecosystem conditions like trustworthiness, standardization, and value generation Why do capable AI agents still fail in real deployments?. Multi-agent teams only beat solo agents when members carry genuine domain expertise — diversity without competence makes things *worse* Does cognitive diversity alone improve multi-agent ideation quality?. So a vector that matches an agent's *claimed* capability tells you nothing about whether that capability is real or trustworthy. The vector is a router, not a referee.

If you want to follow the thread further, the corpus suggests capability vectors are most powerful inside a deliberately *heterogeneous* architecture — small models handling most well-defined subtasks cheaply, larger ones invoked selectively Can small language models handle most agent tasks? — because that's precisely the setting where 'which agent should handle this?' becomes a real, recurring question. And reliability research argues the durable answer isn't smarter routing at all but externalizing memory, skills, and protocols into a harness layer Where does agent reliability actually come from?, with structured shared artifacts beating free-form conversation for coordination Does structured artifact sharing outperform conversational coordination?. Capability vectors, in that light, are the discovery layer of that harness — the searchable index that lets the rest of the structure find what it needs.


Sources 9 notes

Can semantic capability vectors replace manual agent routing?

Versioned Capability Vectors embedded in HNSW indices couple semantic matching with policy and budget constraints, making capability discovery a first-class operation that scales sub-linearly as agent heterogeneity increases.

Why do multi-agent systems fail to coordinate at scale?

AgentsNet benchmark shows agents fail to coordinate strategies either by agreeing too late or adopting strategies without informing neighbors. Agents accept neighbor information without verification, enabling error propagation while remaining capable of detecting direct conflicts.

Can multi-agent teams automatically remove their weakest members?

DyLAN's three-step importance scoring mechanism (propagation, aggregation, selection) quantifies individual agent contributions and automatically removes uninformative agents during inference, optimizing team composition without task-specific tuning.

How can agent systems share learned skills across users?

SkillClaw aggregates interaction trajectories across users, processes them through an autonomous evolver that identifies patterns and refines skills, then synchronizes updates system-wide. This converts siloed individual learning into shared capability improvement without manual curation.

Why do capable AI agents still fail in real deployments?

Historical analysis from GPS to modern AI shows agent failures consistently result from absent ecosystem conditions—value generation, personalization, trustworthiness, social acceptability, and standardization—rather than capability gaps. Even highly capable systems stall without these five conditions.

Does cognitive diversity alone improve multi-agent ideation quality?

Multi-agent teams substantially outperform solo ideation, but only when members possess genuine senior knowledge. Diverse teams without expertise underperform even a single competent agent, because cognitive stimulation without expertise triggers process losses instead of insight.

Can small language models handle most agent tasks?

SLMs handle the repetitive, well-defined language tasks that constitute most agent work at 10–30× lower cost than LLMs, making heterogeneous architectures (SLMs by default, LLMs selective) the economically rational design pattern.

Where does agent reliability actually come from?

Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.

Does structured artifact sharing outperform conversational coordination?

MetaGPT demonstrates that agents producing standardized engineering documents achieve superior coordination compared to conversational exchange. Active information pulling from shared environments eliminates noise and mirrors efficient human workplace infrastructure.

Next inquiring lines