How it works
Audio files are automatically transcribed to text using speech recognition, then the transcript is analyzed by all enabled text-based policies. This means any policy that works on text (toxicity, hate, PII, wordlists, guidelines, etc.) also works on audio with zero additional configuration.Supported audio formats
Any format FFmpeg can decode is supported. All audio is internally converted to 16 kHz mono WAV before transcription.| Format | Extensions |
|---|---|
| MP3 | .mp3 |
| WAV | .wav |
| AAC | .aac, .m4a |
| OGG | .ogg, .oga |
| Opus | .opus |
| FLAC | .flac |
| WebM | .webm |
| AMR | .amr |
| WMA | .wma |
| MP4 | .mp4, .m4a, .mov |
Limits
| Constraint | Value |
|---|---|
| Max file size | 50 MB |
| Max audio duration | 10 minutes |
| Processing timeout | 30 seconds |
| URL schemes | http, https only |
| Private/internal IPs | Blocked (SSRF protection) |
Transcription quality
You can configure transcription quality per channel in the dashboard under Content > Audio > Transcription quality.| Setting | Label | Use case | Relative speed |
|---|---|---|---|
| SPEED (default) | Fast | Real-time moderation, high volume | Fastest |
| BALANCED | Balanced | General purpose, good accuracy | ~2x slower |
| ACCURACY | Accurate | Noisy audio, critical content review | ~3x slower |