Give your AI a voice - and an ear

Text is only half of how people communicate. ToRun's voice stack closes the other half: it speaks, it listens, and it can translate a live conversation as it happens - all on the same platform, with the same transparent, metered billing as everything else.

Narrate a script in a natural voice. Turn an hour of recorded audio into clean, labeled text. Hold a conversation with someone who doesn't share your language and let ToRun interpret in real time. No separate vendor for each piece, no opaque "audio plan" - just speech that works, billed by the second.

Open the Live Translator · Read the voice & transcription guide

Text-to-Speech: any script, a natural voice, 29 languages

Hand the Generation Suite a block of text and get back spoken audio that sounds like a person, not a robot. It works across all 29 languages ToRun supports - so a course, a campaign, an audiobook, or a chatbot can sound native everywhere you operate, not just in English.

Streams as it speaks - playback can begin before the full clip finishes rendering, so long passages feel instant instead of making you wait on a spinner.
Emotion and tone control - on supporting voices, steer delivery (warm, urgent, calm) rather than getting one flat read.
Voice cloning - on models that allow it, train a custom voice from a short reference sample so your brand or persona always sounds the same.
Format on the way out - ask for wav, mp3, or another container and the suite transcodes for you, with any conversion cost passed straight through.

Finished audio uploads to a fast global CDN and comes back as a ready-to-use link - drop it into a post, a workflow, or a download with no extra step.

Speech-to-Text: recordings become searchable text

Point ToRun at a meeting, an interview, a voice note, or a podcast and get back accurate, structured text - billed by the minute, not by a flat subscription.

Speaker diarization - the transcript labels who said what, so a multi-person recording reads like a script instead of a wall of text.
Clean output - punctuation, paragraphs, and timestamps, ready to skim, quote, or feed into the next step.
Pipes into everything - a transcript can flow straight into a translation, a summary, or a knowledge base, because it all lives on one platform.

Use it to caption a video, mine a customer call for insights, or make hours of audio actually searchable.

Translation: text and speech, batch or live

ToRun moves meaning between languages two ways, depending on what you need.

Text translation - drop in a document or a paragraph and get a faithful translation, in real time or in bulk, at the same per-use price as the rest of the platform. The dedicated Translator mode is built for exactly this.
Realtime speech translation - the headline act. Speak into your device and ToRun translates your speech into another language as you talk, with the translated audio playing back over a low-latency live connection.

It's the difference between copy-pasting into a translation box and simply having a conversation.

The Live Translator: a conversation across languages

ToRun's Live Translator turns your browser into a two-way interpreter. Pick the language you want to be heard in, press start, and speak - the platform streams your voice to the model and plays the translation back in seconds, while showing both the original and translated transcript on screen so nothing is lost.

Walkie-talkie mode - a simple A ⇄ B swap lets two people pass the conversation back and forth. Speak in one language, swap, and reply in the other.
Live transcript, both sides - your words and the translation appear as text in real time, so you can read along and confirm the meaning.
You watch the cost tick up - a running estimate shows what the session has cost so far, so a live conversation never becomes a surprise on your bill. Stop whenever you like; billing settles the moment the connection closes.
No setup, no app - it runs in the browser over a secure same-origin connection. Your microphone audio streams to the platform and back; nothing is installed.

Sessions are time-bounded and metered per minute, so the economics are always clear before you start.

Realtime transcription: live captions as they're spoken

Beyond translation, ToRun can transcribe a live stream of speech into text as it happens - the foundation for live captions, real-time meeting notes, or a voice command surface. The same low-latency streaming connection that powers the Live Translator carries a transcription channel, so words appear on screen moments after they're said.

Honest pricing, every second of it

Voice is billed the way the rest of ToRun is: by the unit that actually matters. Text-to-speech is priced per character of script, transcription per minute of audio, realtime sessions per minute of connection. Each job writes its own clear billing record - a long voiceover with a format conversion shows the synthesis and the transcode as separate line items, never buried in a flat "audio fee." Use your own provider keys and pay the provider directly, or use ours at a transparent, published margin. Either way, what you spent is never a mystery.

What you can do

Text-to-Speech in 29 languages turn any script into natural spoken audio that streams as it's generated, with tone control and voice cloning on supporting models.
Speech-to-Text with speaker labels transcribe meetings, interviews, and recordings into clean, diarized text, billed by the minute.
Text translation, batch or instant move documents and messages between languages in real time or in bulk through the dedicated Translator mode.
Live speech translation speak and hear the translation back in seconds over a low-latency connection, with both transcripts on screen and a walkie-talkie swap for two-way conversation.
Realtime transcription live captions and meeting notes generated as the words are spoken, over the same streaming channel.
Metered by the second, never hidden every voice job writes its own billing record at a published rate - per character, per minute, or per realtime minute - with no flat "audio plan" tax.

Back to all features

Voice, Speech & Realtime Translation