Q: How accurate is the transcription?

Accuracy typically exceeds 95% for clear audio in supported languages. We use OpenAI's industry-leading speech recognition under the hood, the same model that powers many professional transcription tools. Real-world accuracy depends on three things: audio clarity (background noise hurts), speaker accents (heavy regional accents may dip a few points), and the language itself (English and Spanish tend to score highest). If you want maximum accuracy, see What audio quality gives the best results? for the small things that make a big difference.

Q: What languages can I transcribe?

We support 99 languages for transcription, including English, Spanish, Portuguese, French, German, Italian, Japanese, Chinese, Korean, Russian, Arabic, Hindi, and many more. You can pick the language explicitly on the upload form for the best accuracy, or leave it on Auto and we'll detect it for you. The language list is the same as OpenAI Whisper's supported set, and the SoundScript.AI interface itself is also available in all 99 languages — see Where do I update my interface language? to change yours.

Q: What are the SRT and TXT download formats for?

SRT is the standard subtitle format — it includes timestamps so each line of text appears at the right moment in your video. Use it for YouTube, Vimeo, video editors like Premiere or Final Cut, or any subtitle-aware player. TXT is plain text with no timestamps — perfect for documents, blog posts, transcribed interviews, or anything where you just want the words. We also offer DOC (formatted Word document) and PDF (printable) — see How do I download as TXT, DOC, or PDF? for details.

Question 1

How accurate is the transcription?

Accepted Answer

Accuracy typically exceeds 95% for clear audio in supported languages. We use OpenAI's industry-leading speech recognition under the hood, the same model that powers many professional transcription tools.

Real-world accuracy depends on three things: audio clarity (background noise hurts), speaker accents (heavy regional accents may dip a few points), and the language itself (English and Spanish tend to score highest). If you want maximum accuracy, see What audio quality gives the best results? for the small things that make a big difference.

Question 2

Can SoundScript.AI identify different speakers in my audio?

Accepted Answer

Yes — speaker identification is included with your subscription on every plan. On the upload form, set Identify Speakers to Yes and we'll automatically label each speaker in your transcription as Speaker 1, Speaker 2, and so on.

It works great for meetings, interviews, podcasts, and any multi-person conversation. There's no limit on the number of speakers we'll detect. Speaker identification adds a little processing time, so leave it off for solo recordings (lectures, voice memos, single-narrator content) to get faster results.

Question 3

What languages can I transcribe?

Accepted Answer

We support 99 languages for transcription, including English, Spanish, Portuguese, French, German, Italian, Japanese, Chinese, Korean, Russian, Arabic, Hindi, and many more.

You can pick the language explicitly on the upload form for the best accuracy, or leave it on Auto and we'll detect it for you. The language list is the same as OpenAI Whisper's supported set, and the SoundScript.AI interface itself is also available in all 99 languages — see Where do I update my interface language? to change yours.

Question 4

What audio quality gives the best results?

Accepted Answer

Clear voices recorded close to a microphone, with minimal background noise. That's the short version. Here's what helps most:

Use a decent microphone — even an entry-level USB mic or modern phone is much better than a laptop's built-in mic.
Record in a quiet room — close windows, turn off fans, and avoid hard surfaces that echo.
Get close to the mic — 6-12 inches is the sweet spot for natural speech.
Avoid background music when possible — see How does SoundScript.AI handle background music or noise? for what to expect when music is unavoidable.

Question 5

How long does processing typically take?

Accepted Answer

Most files are done in seconds to a couple of minutes. A typical 10-minute audio file usually finishes in under 30 seconds.

Files larger than 25MB are auto-split into chunks and processed in parallel, so even an hour-long recording is usually ready in 2-3 minutes. Enabling speaker identification adds a little extra time. The progress bar updates in real time — there's nothing to refresh.

Question 6

What happens with files larger than 25MB?

Accepted Answer

We automatically split large files into smaller chunks behind the scenes, transcribe them in parallel, and stitch the results back together. You don't need to do anything — just upload your file as normal.

The maximum upload size is 1GB. Each chunk is processed independently, which is why a one-hour file can be ready in just a few minutes. The chunk boundaries are placed on natural silences whenever possible to avoid cutting words mid-sentence.

Question 7

Why does my transcription have errors in proper nouns?

Accepted Answer

Proper nouns — names, brand names, technical terms, acronyms — are the hardest part of transcription because they don't follow normal language patterns. Even great audio can produce misspelled names.

A few things that help:

Choose the language explicitly instead of using auto-detect.
Speak proper nouns clearly when recording, with a small pause around them.
Edit the transcription afterward — you can copy the text into any editor and fix names with find-and-replace. We don't currently support a custom vocabulary list, but it's on our radar.

Question 8

Can I edit the transcription text?

Accepted Answer

You can copy the transcription text from the result page and edit it in any text editor or word processor — Google Docs, Microsoft Word, Notepad, whatever you prefer.

We don't have an in-app editor yet, so changes you make outside SoundScript.AI aren't saved back to our servers. The original transcription stays in your dashboard so you can always download a fresh copy. Use the .doc or .txt download formats if you want to edit and keep formatting.

Question 9

What are the SRT and TXT download formats for?

Accepted Answer

SRT is the standard subtitle format — it includes timestamps so each line of text appears at the right moment in your video. Use it for YouTube, Vimeo, video editors like Premiere or Final Cut, or any subtitle-aware player.

TXT is plain text with no timestamps — perfect for documents, blog posts, transcribed interviews, or anything where you just want the words. We also offer DOC (formatted Word document) and PDF (printable) — see How do I download as TXT, DOC, or PDF? for details.

Question 10

How does SoundScript.AI handle background music or noise?

Accepted Answer

We do our best, but heavy background music or noise will reduce accuracy. Light ambient noise (a quiet café, a fan running) usually causes no problem. Loud music or competing voices are the hardest cases.

For interview-style content with intro music, you'll usually see the music transcribed as gibberish or skipped, then accuracy returns when speech starts. If you can record the speech-only version of your audio (or strip music with a tool like Audacity beforehand), accuracy will be noticeably better.

Transcription

No questions match your search.