engineering 18 May 2026 · 2 min read

Understanding multimodal AI pricing — tokens, images, and seconds

One BillingRecord per call, full snapshot Every AI call on ToRun produces exactly one BillingRecord that captures the price at the moment of execution. Provider rates change every few weeks; storing the rate inline means…

ToRun Team

One BillingRecord per call, full snapshot

Every AI call on ToRun produces exactly one BillingRecord that captures the price at the moment of execution. Provider rates change every few weeks; storing the rate inline means an invoice from three months ago is still verifiable today.

The twelve pricing units

We track twelve canonical pricing units that cover every multimodal capability we route:

PerInputToken (text input)
PerOutputToken (text output)
PerCachedInputToken (prompt cache hit)
PerImage (image generation)
PerSecondAudio (TTS / STT)
PerSecondVideo (video generation)
PerCharacterTts (legacy text-to-speech billing)
PerSearchCall (web search tool)
PerComputeSecond (sandbox execution)
PerEmbedding (vector embedding)
PerMinuteRealtime (realtime voice / video)
PerRerankPair (reranking)

Why per-million instead of per-1k

All token-priced models now publish their rates per million tokens. We follow that convention so the math is honest at the scale modern users operate at. The legacy "Per1k" field was removed from AiModel and replaced by AiModelPricing rows that carry a PricingUnit enum.

Multi-currency snapshot

Every BillingRecord carries: CurrencyCode, AmountUsd (canonical), AmountLocal, ExchangeRate and ExchangeRateAt. If the dollar moves, your historical invoices keep their original FX, and a finance report can reconcile back to the original USD basis at any time.

What this means for you

You can audit every cent. You can export billing data and recompute it offline. You can switch providers without losing historical pricing context. That is the foundation we are building everything else on.

Share on X Share on LinkedIn Email