Give your AI a voice - and an ear
Text is only half of how people communicate. ToRun's voice stack closes the other half: it speaks, it listens, and it can translate a live conversation as it happens - all on the same platform, with the same transparent, metered billing as everything else.
Narrate a script in a natural voice. Turn an hour of recorded audio into clean, labeled text. Hold a conversation with someone who doesn't share your language and let ToRun interpret in real time. No separate vendor for each piece, no opaque "audio plan" - just speech that works, billed by the second.
Open the Live Translator · Read the voice & transcription guide
Text-to-Speech: any script, a natural voice, 29 languages
Hand the Generation Suite a block of text and get back spoken audio that sounds like a person, not a robot. It works across all 29 languages ToRun supports - so a course, a campaign, an audiobook, or a chatbot can sound native everywhere you operate, not just in English.
- Streams as it speaks - playback can begin before the full clip finishes rendering, so long passages feel instant instead of making you wait on a spinner.
- Emotion and tone control - on supporting voices, steer delivery (warm, urgent, calm) rather than getting one flat read.
- Voice cloning - on models that allow it, train a custom voice from a short reference sample so your brand or persona always sounds the same.
- Format on the way out - ask for
wav,mp3, or another container and the suite transcodes for you, with any conversion cost passed straight through.
Finished audio uploads to a fast global CDN and comes back as a ready-to-use link - drop it into a post, a workflow, or a download with no extra step.
Speech-to-Text: recordings become searchable text
Point ToRun at a meeting, an interview, a voice note, or a podcast and get back accurate, structured text - billed by the minute, not by a flat subscription.
- Speaker diarization - the transcript labels who said what, so a multi-person recording reads like a script instead of a wall of text.
- Clean output - punctuation, paragraphs, and timestamps, ready to skim, quote, or feed into the next step.
- Pipes into everything - a transcript can flow straight into a translation, a summary, or a knowledge base, because it all lives on one platform.
Use it to caption a video, mine a customer call for insights, or make hours of audio actually searchable.
Translation: text and speech, batch or live
ToRun moves meaning between languages two ways, depending on what you need.
- Text translation - drop in a document or a paragraph and get a faithful translation, in real time or in bulk, at the same per-use price as the rest of the platform. The dedicated Translator mode is built for exactly this.
- Realtime speech translation - the headline act. Speak into your device and ToRun translates your speech into another language as you talk, with the translated audio playing back over a low-latency live connection.
It's the difference between copy-pasting into a translation box and simply having a conversation.
The Live Translator: a conversation across languages
ToRun's Live Translator turns your browser into a two-way interpreter. Pick the language you want to be heard in, press start, and speak - the platform streams your voice to the model and plays the translation back in seconds, while showing both the original and translated transcript on screen so nothing is lost.
- Walkie-talkie mode - a simple A ⇄ B swap lets two people pass the conversation back and forth. Speak in one language, swap, and reply in the other.
- Live transcript, both sides - your words and the translation appear as text in real time, so you can read along and confirm the meaning.
- You watch the cost tick up - a running estimate shows what the session has cost so far, so a live conversation never becomes a surprise on your bill. Stop whenever you like; billing settles the moment the connection closes.
- No setup, no app - it runs in the browser over a secure same-origin connection. Your microphone audio streams to the platform and back; nothing is installed.
Sessions are time-bounded and metered per minute, so the economics are always clear before you start.
Realtime transcription: live captions as they're spoken
Beyond translation, ToRun can transcribe a live stream of speech into text as it happens - the foundation for live captions, real-time meeting notes, or a voice command surface. The same low-latency streaming connection that powers the Live Translator carries a transcription channel, so words appear on screen moments after they're said.
Honest pricing, every second of it
Voice is billed the way the rest of ToRun is: by the unit that actually matters. Text-to-speech is priced per character of script, transcription per minute of audio, realtime sessions per minute of connection. Each job writes its own clear billing record - a long voiceover with a format conversion shows the synthesis and the transcode as separate line items, never buried in a flat "audio fee." Use your own provider keys and pay the provider directly, or use ours at a transparent, published margin. Either way, what you spent is never a mystery.
What you can do
- Text-to-Speech in 29 languages
bi-mic- turn any script into natural spoken audio that streams as it's generated, with tone control and voice cloning on supporting models. - Speech-to-Text with speaker labels
bi-soundwave- transcribe meetings, interviews, and recordings into clean, diarized text, billed by the minute. - Text translation, batch or instant
bi-translate- move documents and messages between languages in real time or in bulk through the dedicated Translator mode. - Live speech translation
bi-headset- speak and hear the translation back in seconds over a low-latency connection, with both transcripts on screen and a walkie-talkie swap for two-way conversation. - Realtime transcription
bi-badge-cc- live captions and meeting notes generated as the words are spoken, over the same streaming channel. - Metered by the second, never hidden
bi-receipt- every voice job writes its own billing record at a published rate - per character, per minute, or per realtime minute - with no flat "audio plan" tax.