Skip to main content
Audio moderation is available to customers on custom plans. If you’re interested in using audio moderation, please reach out here.

How it works

Audio files are automatically transcribed to text using speech recognition, then the transcript is analyzed by all enabled text-based policies. This means any policy that works on text (toxicity, hate, PII, wordlists, guidelines, etc.) also works on audio with zero additional configuration.
const result = await moderationApi.content.submit({
  content: {
    type: "audio",
    url: "https://example.com/audio.mp3",
  },
});

Supported audio formats

Any format FFmpeg can decode is supported. All audio is internally converted to 16 kHz mono WAV before transcription.
FormatExtensions
MP3.mp3
WAV.wav
AAC.aac, .m4a
OGG.ogg, .oga
Opus.opus
FLAC.flac
WebM.webm
AMR.amr
WMA.wma
MP4.mp4, .m4a, .mov

Limits

ConstraintValue
Max file size50 MB
Max audio duration10 minutes
Processing timeout30 seconds
URL schemeshttp, https only
Private/internal IPsBlocked (SSRF protection)

Transcription quality

You can configure transcription quality per channel in the dashboard under Content > Audio > Transcription quality.
SettingLabelUse caseRelative speed
SPEED (default)FastReal-time moderation, high volumeFastest
BALANCEDBalancedGeneral purpose, good accuracy~2x slower
ACCURACYAccurateNoisy audio, critical content review~3x slower

Usage and billing

Each audio moderation request costs 2 units: 1 for transcription + 1 for policy analysis.