STT Benchmark

Compare History Import Team

Speech-to-Text Battle Arena

Compare transcription providers head-to-head with live streaming or batch processing

Choose Your Fighters

Select streaming providers to battle (2 per row)

Audio Settings

Provider-agnostic audio configuration

Sample Rate

Higher = better quality, more bandwidth

Encoding

PCM16 is most compatible

Channels

Mono recommended for speech

Language

Primary language in audio

Disconnected

GL

gladia

Ready

Waiting for audio...

-Latency

-Words

-Conf

-WER

DG

deepgram

Ready

Waiting for audio...

-Latency

-Words

-Conf

-WER

Provider Options

GL

gladia

Configure options

Encoding

Audio encoding format

Sample rate

Bit depth

Interim results

Receive partial transcripts

Channels

Audio channels (1-8)

Endpointing (s)

Silence before ending segment

DG

deepgram

Configure options

Model

Encoding

Audio encoding format

Interim results

Punctuation

Smart format

Format dates, numbers, etc.

Diarization

Speaker identification

Filler words

Detect "uh", "um"

Numerals

"twenty" → "20"

Measurements

"five meters" → "5m"

Profanity filter

Language detection

Dictation mode

Sentiment analysis

Entity detection

Topic detection

Utterance split (ms)

Silence threshold for splitting

Compare Accuracy

Calculate WER and CER metrics against ground truth transcripts

Real-Time Battle

Watch providers compete head-to-head with live transcription

Batch Processing

Upload hundreds of files with annotations for bulk comparison