What is Munsit Speech to Text?
Munsit Speech to Text is an advanced AI-powered service that converts spoken Arabic into accurate, structured text. It is built for high-accuracy Arabic recognition across dialects and real-world speech conditions, helping teams turn audio into searchable and actionable information.Key Features
- High-Accuracy Arabic Transcription: Transcribes spoken Arabic with strong performance across dialects and accents
- Speaker Diarization: Separates and labels speakers for clearer multi-speaker conversations
- Meeting Intelligence Ready: Supports minutes of meetings and downstream analysis workflows
- Flexible Audio Support: Works with common audio formats from different recording platforms
Core Workflows
- Transcriber: Convert pre-recorded audio files into text with word-level timing
- Diarization: Identify who spoke and return speaker-labeled transcript segments
- Minutes of Meetings: Generate structured transcripts optimized for meeting use cases
Available Models
| Model | ID | Description |
|---|---|---|
| Munsit | munsit | Default model, optimized for Arabic speech recognition |
| Munsit En-Ar | munsit-en-ar | Model for mixed Arabic-English spoken content with code-switching support |
model parameter. If omitted, munsit is used by default.
Supported Language
Munsit is optimized for Arabic speech recognition and transcription, with strong coverage for dialect and accent variation. Themunsit-en-ar model is designed for mixed Arabic-English spoken content, supporting code-switching where speakers naturally alternate between the two languages within the same conversation or utterance.
Supported Audio Formats
.aac, .wma, .amr, .flac, .m4a, .ogg, .mp2, .opus, .m4r, .webm, .mp3, .wav
Duration Limits
- Audio Transcription: Supports files shorter than 60 minutes
- Minutes of Meetings: Supports files shorter than 30 minutes
- Streaming: No duration limit while the connection stays active
