Media Tool

Transcribe Audio to Text

Convert speech in MP3, WAV, M4A, or OGG audio into accurate text. Auto-detects 25+ languages. Powered by OpenAI Whisper running fully inside your browser.

100% private. Your audio never leaves your browser. No upload. No signup.

1. Quality

Pick how much accuracy you need. The first time you drop a file, the engine prepares itself automatically — after that, it's instant.

Quality level

2. Language & task

3. Upload audio

Drop audio file here or click to select

MP3, WAV, M4A, OGG, FLAC · up to ~200 MB recommended

Drop an audio file to start.

How to Transcribe Audio to Text

1

Pick a quality tier

Balanced is the recommended default and works well for most audio. Choose Fast for the quickest results on mobile, or Best quality for the most accurate transcription of non-English, accented, or noisy audio. You can change this any time before your first file drop.

2

Drop your audio file

Drag MP3, WAV, M4A, OGG, or FLAC files onto the drop zone. The first time you do this, the engine prepares automatically — a one-time ~76 MB load that your browser caches so future visits are instant. The audio itself is decoded locally and never leaves your device.

3

Copy or save the transcript

Tokens stream in as Whisper transcribes. When it finishes, copy plain text to your clipboard, save a .txt file, or grab a .srt subtitle file with timestamps for your video editor.

How It Works

Real Whisper, running locally

We use the Hugging Face onnx-community/whisper-base model — the same transformer architecture OpenAI released in 2022 and that powers most professional transcription services. The model runs through ONNX Runtime Web, accelerated by WebGPU when available and WebAssembly otherwise.

Why your audio stays on-device

The Web Worker decodes the audio with the browser's built-in AudioContext, resamples to 16 kHz mono, and feeds the Float32Array directly into the in-memory model. Nothing is sent over the network after the model file finishes downloading. You can disconnect from the internet and confirm transcription still runs.

Real auto-detect (not just defaulting to English)

Most browser transcribers fall back to English when you choose "auto-detect". We run a real Whisper forward pass on the first 30 seconds, mask logits to the 99 language tokens, and pick the argmax. Detection runs in around 200–500 ms on WASM and works for English, Ukrainian, German, Polish, Spanish, and 90+ other languages.

Long-audio chunking

For files longer than 30 seconds, we configure Whisper with chunk_length_s: 30and stride_length_s: 5. The pipeline splits the audio into overlapping 30-second windows, transcribes each, and stitches the results together so words at window boundaries are not lost. We have tested files up to several hours.

Who Uses Free Audio Transcription?

Journalists & researchers

Source interviews, off-the-record briefings, and confidential meetings stay on the device — no SaaS dashboard ever sees the audio. The .srt download lets you cite specific timestamps directly from the recording.

Podcasters & content creators

Generate show notes, captions, and full episode transcripts without paying per-minute fees. The .srt output drops straight into Premiere, DaVinci Resolve, CapCut, or YouTube Studio.

Students

Convert lecture recordings into searchable notes. Lectures in non-English languages auto-detect, and the Translate task converts to English for sharing with study partners.

Legal & medical professionals

Compliance-conscious workflows where audio cannot be sent to third-party servers. Because the entire pipeline runs in-browser, this tool plays well with HIPAA, GDPR, and attorney-client confidentiality requirements.

Frequently Asked Questions

Is the audio uploaded to a server?

No. The transcription model runs entirely inside your browser using WebAssembly or WebGPU. Your audio file never leaves your device — there is no upload step, no API call, and no temporary server-side storage. You can verify this by disconnecting from the internet after the model is downloaded; transcription will continue to work.

Which audio formats are supported?

We support every format your browser can decode natively: MP3, WAV, M4A, OGG, FLAC, and AAC. Video files (MP4, MKV, WebM) work if your browser can decode their audio track. If a file fails to decode, the tool will show a clear error with the supported format list.

How accurate is the transcription?

We use OpenAI Whisper, the same speech recognition model that powers most professional transcription services. The Balanced level (default) is excellent for clear English speech and good for most other languages. The Best quality level noticeably improves accuracy for non-English audio, accents, and noisy recordings.

How long can the audio file be?

There is no hard limit. For audio longer than 30 seconds, the tool automatically chunks the audio into 30-second windows with a 5-second overlap, then stitches the results together. We have tested files up to several hours long. Very large files may use a lot of memory, so 4 GB+ of free RAM is recommended for hour-long recordings.

Why is the first transcription slow?

The first time you drop a file, the engine prepares itself in the background — your browser fetches the Whisper model (around 76 MB at Balanced quality) and warms up the runtime. This is a one-time cost; afterwards the model is cached by your browser, so subsequent visits load instantly. Warm-up adds another 1–7 seconds before the first transcription starts.

What is the difference between Transcribe and Translate?

Transcribe keeps the source language: a Spanish recording becomes Spanish text. Translate converts any source language directly to English text in one pass. Translation works for the languages Whisper supports — Indo-European, East Asian, Arabic, Hindi, and more.

Can I download the transcript?

Yes. The Copy button copies plain text to your clipboard. Download .txt saves a plain-text file. Download .srt produces a valid SubRip subtitle file with timestamps you can drop directly into video editors like Premiere, DaVinci Resolve, or YouTube Studio.

Does it work on mobile?

Yes, though mobile devices generally run a slower fallback. We recommend the Fast quality level on mobile (about 4× smaller and quicker to prepare). Hour-long files may be too memory-intensive for older phones; for big jobs, use a desktop browser.

Privacy notice

This audio transcriber runs entirely in your web browser using ONNX Runtime Web and OpenAI Whisper. After the one-time model download from Hugging Face's public CDN, no network requests are made — your audio file is decoded, transcribed, and displayed without ever leaving your device. There are no API keys, no upload buckets, no third parties. You can verify this by disconnecting from the internet after the model loads; transcription will continue to work.