Why One Model Is Never Enough
The single-model trap Every major AI provider wants you to believe their flagship model is the right answer to every question. That belief is commercially convenient for them and technically incorrect for you. A model th…
ToRun Team
AuthorThe single-model trap
Every major AI provider wants you to believe their flagship model is the right answer to every question. That belief is commercially convenient for them and technically incorrect for you.
A model that excels at long-context legal reasoning is not the right model for generating a ten-word image caption. A model with cutting-edge vision is overkill — and potentially over-budget — for a batch job that reformats structured JSON. A model whose pricing is optimized for throughput may be the worst possible choice when you need a two-second interactive response.
When you build on a single model, you are not choosing the best tool. You are choosing the best available tool within an artificial constraint. Over time, that constraint compounds: you over-spend on simple tasks, you under-deliver on complex ones, and when that provider has an outage or a price increase, your entire product stalls.
Capabilities, not names
The useful abstraction is not "which model" but "which capabilities does this request need." That question has a precise answer.
A chat message that includes an uploaded screenshot needs vision. A workflow step that calls an external API needs tools. A coding assistant that should check its own output needs code execution. A nightly summarization pipeline that doesn't need to be fast can use batch-tier pricing; an in-product assistant that the user is waiting on cannot.
When you enumerate the capability requirements first, the model selection problem becomes a filter-and-rank problem. Given a required set — say, text, vision, and structured output — find every model that supports all three, then rank by quality and price for the current request context. That ranking can change from one request to the next based on what the user is doing and what their funding situation is.
This is what ToRun's routing pipeline does. Mode (Chat, Code, Image, Research, and others) maps to a capability set. The capability set filters the model catalog. Price and quality ranking picks the winner. The result is that you can use DeepSeek for high-volume structured extraction, Claude or GPT-4-class models for nuanced reasoning, Google Gemini for large-context document work, and a specialized image model for generation — all within the same product surface, without writing dispatch logic yourself.
Fallback is not a nice-to-have
Provider outages are not rare edge cases. They happen several times a month across the industry. If your product has a single-provider dependency and that provider has a degraded API, your product is also degraded.
A fallback chain is the operational minimum: when the primary model for a request returns a transient error, the platform retries on the next best qualified model. This requires that the routing layer already knows which models are qualified — which is exactly what capability-first routing gives you.
Fallback also has a less obvious benefit: it keeps your billing behavior predictable. When the first-choice model is temporarily priced above a usage ceiling, a fallback to the next qualified model keeps costs bounded without surfacing an error to the user.
Where this goes: multi-model deliberation
Single routing covers most use cases. But some questions are genuinely hard enough that the right answer is to ask multiple models and reconcile their outputs.
Consider a research task where factual accuracy matters and the cost of a confident wrong answer is high. You could run the same question through a reasoning-focused model and a retrieval-augmented model, then have a third model adjudicate the outputs — weighing agreement, confidence, and sourced evidence. This is not theoretical; it is a natural extension of the capability-routing substrate.
ToRun's Council mode, currently in active development, implements exactly this pattern: multiple models deliberate in parallel, an adjudicator reconciles their outputs, and each claim can be traced back to the model and source that produced it. The billing model works the same way — every participating model call writes its own billing record, so the full cost of a council run is transparent and auditable.
The practical takeaway
If you are evaluating AI platforms, ask two questions: how does it select a model for each request, and what happens when that model is unavailable or too expensive?
If the answer to either question is "it always uses model X," that is an architecture smell. The models available today are not the best models that will exist in six months. New providers enter the market. Pricing shifts. Capability boundaries change. An architecture that treats model selection as a routing problem — not a configuration choice — ages better, costs less over time, and degrades more gracefully under pressure.
One model is a starting point. A routing layer is the actual foundation.