Why does it work in Chrome but not Firefox?

Firefox hasn't implemented the SpeechRecognition API as of 2026 (mainly because the Chromium implementation depends on Google's hosted service, and Mozilla wants a different approach). For Firefox users, install a desktop tool that uses a local model like Whisper.

Is my audio sent to Google?

Yes — Chrome / Edge / Arc / Brave all route SpeechRecognition through Google's recognition service. Per Google's documentation the audio isn't tied to your account or used for training, but it IS transmitted off your device. For local-only STT (no audio leaves your machine), this isn't the right tool.

What's "interim" vs "final" text?

Interim results (italicized) are the model's best guess as you speak — they update in real time as more context arrives. Final results (plain text) are committed once the model is confident, typically after a brief pause. You can keep speaking past a final result; new finals just append.

Can I get word-level timestamps?

Not from the standard browser API — it just gives you text. For timestamped transcripts (caption files, time-coded notes) use Whisper or a service like AssemblyAI / Deepgram that exposes timestamps explicitly.

Why is accuracy lower than I expected?

Common causes: background noise, microphone too far from your mouth, strong accent or fast speech, technical jargon (proper nouns, brand names, acronyms), or speaking in a language other than the one selected in the dropdown. Try a closer mic and slower pace; for jargon, dictate around the term and edit it in afterward.

Does it work offline?

No — the browser needs an internet connection because the recognition runs on Google's servers (in Chrome's case). For offline STT, install a Whisper-based desktop tool.

Audio, Video & Voice · Free tool

Speech to Text

Transcribe your voice to text live using any mic in 30+ languages. Free online speech‑to‑text tool — edit and copy results instantly with no download or registration.

Updated June 2026

Language

Click Start listening and speak.

Found this useful?Email Buy Me a Coffee

What it does

Press the start button, speak into your microphone, and watch your words appear as text in real time. Useful for dictating a quick note, drafting an email when typing is slower than thinking, capturing meeting notes, or transcribing a voice memo into shareable text. The output updates word-by-word as you speak — you'll see "interim" recognition (italicized, may change) become "final" recognition (committed, won't change) as the model gets enough context.

The tool uses your browser's built-in SpeechRecognition API, which on Chrome / Edge / Brave / Arc routes audio through Google's hosted recognition service. That means it requires an internet connection and your audio is briefly transmitted to Google for processing — Google doesn't tie it to your account or retain it for model training (per their developer docs), but if total local processing is a hard requirement, this isn't the right tool. Safari has a partial implementation; Firefox doesn't support the API at all.

Supports 50+ languages — pick yours from the dropdown before starting. Accuracy is excellent for clear adult speech in a quiet room (~95%+ word accuracy in English), drops with background noise, accent strength, technical jargon, or mumbled speech. If a word comes out wrong, you can edit the transcript directly before copying.

Embed this tool on your siteShow snippet

Paste this snippet into any page. Loads on-demand (lazy), no tracking scripts, and sized to most dashboards. Replace the height to fit your layout.

<iframe src="https://freetoolarena.com/embed/speech-to-text" width="100%" height="720" frameborder="0" loading="lazy" title="Speech to Text" style="border:1px solid #e2e8f0;border-radius:12px;max-width:720px;"></iframe>

Embed docs →

How to use it

Pick your spoken language from the dropdown (defaults to your browser's locale).
Click Start. The browser asks for microphone permission the first time — grant it.
Speak naturally. Words appear word-by-word; pause at sentence boundaries to commit segments.
Edit the transcript directly if you spot a recognition error — the box is fully editable.
Click Stop when you're done, then Copy to put the text on your clipboard.

When to use this tool

Dictating a quick email, message, or note when typing is slower than speaking.
Capturing a voice memo as text without uploading the audio anywhere.
Transcribing a one-on-one meeting (place phone or laptop on the table).
Drafting in a different language than you can type in fluently — speak it, edit the result.

When not to use it

Sensitive content (legal, medical, financial details) — audio is transmitted to Google's recognition service. For local-only STT, use Whisper running locally or a privacy-focused alternative.
Multi-speaker transcripts (meetings, interviews) — the API doesn't separate speakers. Use Otter, Rev, or Whisper-based tools for diarization.
Long audio files — the API is designed for live mic input, not file upload. For audio-file transcription use a Whisper-based tool.
Noisy environments — accuracy drops sharply with background music, traffic, or multiple voices.

Common use cases

Onboarding a colleague who needs the same calculation/conversion
Verifying a number or output before passing it on
Quick conversion during a typical workday
Pre-decision sanity-check on inputs and outputs

Frequently asked questions

Why does it work in Chrome but not Firefox?: Firefox hasn't implemented the SpeechRecognition API as of 2026 (mainly because the Chromium implementation depends on Google's hosted service, and Mozilla wants a different approach). For Firefox users, install a desktop tool that uses a local model like Whisper.
Is my audio sent to Google?: Yes — Chrome / Edge / Arc / Brave all route SpeechRecognition through Google's recognition service. Per Google's documentation the audio isn't tied to your account or used for training, but it IS transmitted off your device. For local-only STT (no audio leaves your machine), this isn't the right tool.
What's "interim" vs "final" text?: Interim results (italicized) are the model's best guess as you speak — they update in real time as more context arrives. Final results (plain text) are committed once the model is confident, typically after a brief pause. You can keep speaking past a final result; new finals just append.
Can I get word-level timestamps?: Not from the standard browser API — it just gives you text. For timestamped transcripts (caption files, time-coded notes) use Whisper or a service like AssemblyAI / Deepgram that exposes timestamps explicitly.
Why is accuracy lower than I expected?: Common causes: background noise, microphone too far from your mouth, strong accent or fast speech, technical jargon (proper nouns, brand names, acronyms), or speaking in a language other than the one selected in the dropdown. Try a closer mic and slower pace; for jargon, dictate around the term and edit it in afterward.
Does it work offline?: No — the browser needs an internet connection because the recognition runs on Google's servers (in Chrome's case). For offline STT, install a Whisper-based desktop tool.

Learn more

Explore more audio, video & voice tools

100% in-browserNo downloadsNo sign-upMalware-freeHow we keep this safe →