LLM Reasoning and Architecture Agentic and Multi-Agent Systems Reinforcement Learning for LLMs

Can routing beat building one better model?

Does directing queries to specialized models via semantic clustering outperform investing in a single frontier model? This challenges whether model improvement or model selection drives performance gains.

Note · 2026-02-23 · sourced from Routers

Avengers-Pro demonstrates that routing queries to different models based on semantic clustering can exceed the performance of any individual model in the pool — including frontier models. The mechanism: embed incoming queries, cluster by semantic similarity, evaluate per-cluster model performance-efficiency scores, and route each query to the highest-scoring model for its cluster.

Three results establish the claim:

The earlier Avengers work made an even more striking claim: ten models of ~7B parameters each, with routing, surpassed GPT-4.1 and 4.5 across 15 datasets. This suggests the performance gain from optimal model selection can be comparable to the gap between model generations.

The architecture is lightweight: three operations at inference time (embedding, nearest-cluster lookup, score aggregation). The heavy work — fitting the clustering model and estimating per-cluster performance statistics — happens offline on a calibration set (70% for fitting, 30% for evaluation). This makes the approach deployable as a thin routing layer atop any model API ecosystem.

Since Can we allocate inference compute based on prompt difficulty?, Avengers-Pro adds a complementary optimization axis. Compute-optimal scaling asks "how much inference budget per query?" Routing asks "which model per query?" These are independent — a routing layer could be composed with per-query compute allocation for a two-dimensional Pareto optimization. Since Can inference compute replace scaling up model size?, routing extends this: you don't need a bigger model OR more compute — you need the right model for this specific query type.

The implication challenges the frontier model race: rather than building one model that dominates on everything, assembling a diverse pool of specialized-ish models with good routing may be both cheaper and more effective. This aligns with the heterogeneous architecture thesis in Can small language models handle most agent tasks? — routing makes the heterogeneous approach practical.


Source: Routers

Related concepts in this collection

Concept map
12 direct connections · 115 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

test-time model ensembling via embedding-cluster routing surpasses any individual frontier model — model selection is a stronger lever than model improvement