Understanding multimodal AI pricing — tokens, images, and seconds
One BillingRecord per call, full snapshot Every AI call on ToRun produces exactly one BillingRecord that captures the price at the moment of execution. Provider rates change every few weeks; storing the rate inline means…
ToRun Team
AuthorOne BillingRecord per call, full snapshot
Every AI call on ToRun produces exactly one BillingRecord that captures the price at the moment of execution. Provider rates change every few weeks; storing the rate inline means an invoice from three months ago is still verifiable today.
The twelve pricing units
We track twelve canonical pricing units that cover every multimodal capability we route:
- PerInputToken (text input)
- PerOutputToken (text output)
- PerCachedInputToken (prompt cache hit)
- PerImage (image generation)
- PerSecondAudio (TTS / STT)
- PerSecondVideo (video generation)
- PerCharacterTts (legacy text-to-speech billing)
- PerSearchCall (web search tool)
- PerComputeSecond (sandbox execution)
- PerEmbedding (vector embedding)
- PerMinuteRealtime (realtime voice / video)
- PerRerankPair (reranking)
Why per-million instead of per-1k
All token-priced models now publish their rates per million tokens. We follow that convention so the math is honest at the scale modern users operate at. The legacy "Per1k" field was removed from AiModel and replaced by AiModelPricing rows that carry a PricingUnit enum.
Multi-currency snapshot
Every BillingRecord carries: CurrencyCode, AmountUsd (canonical), AmountLocal, ExchangeRate and ExchangeRateAt. If the dollar moves, your historical invoices keep their original FX, and a finance report can reconcile back to the original USD basis at any time.
What this means for you
You can audit every cent. You can export billing data and recompute it offline. You can switch providers without losing historical pricing context. That is the foundation we are building everything else on.