engineering 20 May 2026 · 5 min read

How ToRun picks the right model for every message

The problem with "just use GPT-4" Most AI platforms make a quiet choice on your behalf: they pick a model they like, wire everything to it, and move on. That works until it doesn't — until the model is down, until a comp…

ToRun Team

The problem with "just use GPT-4"

Most AI platforms make a quiet choice on your behalf: they pick a model they like, wire everything to it, and move on. That works until it doesn't — until the model is down, until a competitor releases something better for a specific task, until you want to bring your own API key and keep costs in your own account, until you need image generation and your default model cannot produce images.

The usual fix is a dropdown. Pick a model, use it everywhere, figure out the trade-offs yourself. That is not routing; it is delegation. ToRun does something structurally different.

The capability-first pipeline

Every request in ToRun passes through four stages before a model is touched.

1. Mode identifies the product surface. When you open a Chat session, switch to Code mode, or kick off a Research task, you are selecting a Mode. The mode is not just a UI label — it carries a defined set of required and optional capabilities. A Code session needs structured output and, optionally, code execution. A Research task needs tools (web search), long context, and structured output. A basic chat session needs text generation and nothing else by default.

2. The capability set becomes a filter. ToRun maintains a catalog of models across many providers — OpenAI, Anthropic/Claude, Google/Gemini, DeepSeek, xAI/Grok, Mistral, and others. Each model entry carries a join of capabilities it actually supports: vision, tools, code execution, image generation, audio in, audio out, video generation, embeddings, reranking, real-time streaming, reasoning, and more. The capability set from step one is matched against this catalog. Models that cannot satisfy all required capabilities are dropped.

3. Price/quality ranking selects the winner. Among the models that pass the capability filter, ToRun ranks by your configured preference. Interactive chat defaults to cheapest-first — you get a fast, inexpensive answer unless you explicitly want a premium model. Workflow runners flip to quality-first, because an automated pipeline that runs without you watching should pick the best result, not the cheapest shortcut. BYOK users factor into pricing differently because the request routes through their own provider account.

4. Fallback chains handle provider failures. If the selected model returns a service error, the same pipeline re-runs with that model excluded. The second-best capable model answers instead. No error surfaces to you unless every viable candidate is unavailable.

A concrete example: vision plus tools

Say you paste a screenshot and ask ToRun to search the web for context on what it shows. That request touches two capabilities simultaneously: vision (the model must accept an image input) and tools (the model must be able to invoke a web search function).

Under a single-model approach, you are stuck: your platform either supports that combination or it does not. Under capability-first routing, ToRun intersects both requirements. Only models that have confirmed vision support AND confirmed tool-call support survive the filter. From that shortlist, price/quality ranking picks the best match for your current mode. If the top candidate is rate-limited or unavailable, the next candidate on the list handles the request without you doing anything.

This matters practically: the set of models that support vision has changed significantly over the past year, and it will keep changing. Capability routing means ToRun automatically considers newly added models as soon as they are catalogued, without any product update on your end.

Why this beats lock-in

A single-model platform has one fallback strategy: wait for the vendor to fix their outage. A capability-first platform has a catalog. When OpenAI has an incident, requests that can be satisfied by Anthropic or Gemini are served there. When a new Mistral model offers better price/quality on structured output, it enters the ranking for relevant modes without a forced migration.

The same logic applies to BYOK. When you add your own API key for a provider, that credential is scoped into the routing layer. Requests that match the provider's capable models can draw from your key instead of ToRun's account, shifting the cost to your provider relationship at the appropriate billing tier. The routing pipeline does not change — only which credential is used at execution time.

The tradeoff is honesty: ToRun cannot guarantee the cheapest model is the smartest one for your task. The ranking is explicit — you can see what was selected and why — and you can override it. That transparency is the point. Every request writes a billing record with the model used, the pricing unit active at execution time, and the cost breakdown. You can audit any call months later, even after provider prices have changed, because the snapshot is frozen at the moment the request ran.

One model that does everything is a convenient fiction. What you actually want is a platform that finds the right model for each task and makes that choice legible.

Share on X Share on LinkedIn Email