Should model routing decisions account for prompt-tier dependencies?
This explores whether the choice of which model handles a query should be made jointly with the prompting strategy — because a prompt that helps a cheap model can actively hurt a strong one, so routing and prompt design may not be separable decisions.
This reads the question as asking whether routing — picking which model answers — can be decided independently of how the prompt is written, or whether the two are entangled. The corpus suggests they're entangled. The sharpest evidence: a 23-prompt benchmark across 12 models found that rephrasing and background-knowledge prompts boost cheap models, while step-by-step reasoning actually *reduces* accuracy in high-performance ones Do prompt techniques work the same across all LLM tiers?. So the same prompt is not tier-neutral — it has a sign that flips with model capability. A router that sends a query to a budget model and then applies a generic 'best practice' prompt could be sabotaging the very model it just chose.
This matters because most routing research treats selection as a clean pre-generation decision. RouteLLM and Hybrid-LLM estimate query difficulty and pick a single model before any token is generated, banking 40-50% cost savings on the assumption that the model is the lever Can routers select the right model before generation happens?. Cluster-based routing (Avengers-Pro) goes further, beating frontier models by sending each semantic cluster to its optimal model Can routing beat building one better model?. But both optimize the model axis alone. The tier-dependency finding implies a second axis — the prompt — that good routing should co-optimize, especially when routing *down* to cheaper models is the whole point.
The corpus already hints that sophisticated routing means optimizing several coupled choices at once rather than one. MasRouter shows multi-agent routing must jointly decide collaboration topology, agent count, role allocation, *and* per-agent model assignment through a cascaded controller — treating these as separable underperforms What decisions must multi-agent routing systems optimize simultaneously?. Prompt-tier dependency is a natural fifth dimension: which prompt template you attach is conditional on which model you routed to. The economic case for heterogeneous architectures — small models by default, large ones selectively — makes this concrete, since most agent work runs on the cheap tier where prompt phrasing has the largest leverage Can small language models handle most agent tasks?.
There's a deeper structural reason to bundle prompt with route. LLM Programs decompose a task and hand each model call only its step-specific context, treating prompt construction as part of the control flow rather than a fixed wrapper Can algorithms control LLM reasoning better than LLMs alone?. If prompts are already being built per-step, conditioning them on the routed model's tier is a small extension, not a new architecture. The takeaway you might not have expected: routing's measured cost savings could be leaving accuracy on the table — not because the model choice was wrong, but because the prompt that rode along with it was tuned for a different tier than the one that answered.
Sources 6 notes
A 23-prompt benchmark across 12 LLMs shows rephrasing and background-knowledge prompts boost cheap models, while step-by-step reasoning reduces accuracy in high-performance models. Task structure, not generic best practices, determines which prompts help.
RouteLLM and Hybrid-LLM both achieve 40-50% cost reduction by routing to a single model based on query difficulty prediction, not response evaluation. Single-model routing minimizes latency compared to ensemble or cascade alternatives.
Avengers-Pro achieves 7% higher accuracy than GPT-5-medium by routing queries to optimal models per semantic cluster, or matches its performance at 27% lower cost. Ten 7B models with routing previously surpassed GPT-4.1 and 4.5, suggesting selection is a stronger lever than scaling.
MasRouter shows that routing in multi-agent systems must jointly optimize collaboration topology, agent count, role allocation, and per-agent LLM assignment through a cascaded controller. This unified approach surpasses single-model routing by 3.51% accuracy while cutting HumanEval costs by 49%.
SLMs handle the repetitive, well-defined language tasks that constitute most agent work at 10–30× lower cost than LLMs, making heterogeneous architectures (SLMs by default, LLMs selective) the economically rational design pattern.
LLM Programs embed LLMs within explicit algorithms that manage control flow and state, presenting only step-specific context to each LLM call. This information hiding addresses capability and context window limits while treating complex reasoning as modular, debuggable sub-tasks.