Get Started

What is Munsit Speech to Text?

Munsit Speech to Text is an advanced AI-powered service that converts spoken Arabic into accurate, structured text. It is built for high-accuracy Arabic recognition across dialects and real-world speech conditions, helping teams turn audio into searchable and actionable information.

Key Features

High-Accuracy Arabic Transcription: Transcribes spoken Arabic with strong performance across dialects and accents
Speaker Diarization: Separates and labels speakers for clearer multi-speaker conversations
Meeting Intelligence Ready: Supports minutes of meetings and downstream analysis workflows
Flexible Audio Support: Works with common audio formats from different recording platforms

Munsit Speech to Text helps you process interviews, meetings, calls, and media content at scale while keeping Arabic language quality and context at the center.

Core Workflows

Transcriber: Convert pre-recorded audio files into text with word-level timing
Diarization: Identify who spoke and return speaker-labeled transcript segments
Minutes of Meetings: Generate structured transcripts optimized for meeting use cases

Available Models

Model	ID	Description
Munsit	`munsit`	Default model, optimized for Arabic speech recognition
Munsit En-Ar	`munsit-en-ar`	Model for mixed Arabic-English spoken content with code-switching support

All API endpoints accept an optional model parameter. If omitted, munsit is used by default.

Supported Language

Munsit is optimized for Arabic speech recognition and transcription, with strong coverage for dialect and accent variation. The munsit-en-ar model is designed for mixed Arabic-English spoken content, supporting code-switching where speakers naturally alternate between the two languages within the same conversation or utterance.

Supported Audio Formats

.aac, .wma, .amr, .flac, .m4a, .ogg, .mp2, .opus, .m4r, .webm, .mp3, .wav

Duration Limits

Audio Transcription: Supports files shorter than 60 minutes
Minutes of Meetings: Supports files shorter than 30 minutes
Streaming: No duration limit while the connection stays active

For longer recordings in Audio Transcription or Minutes of Meetings, split the audio into shorter segments for best performance.

Quickstart

Text To Speech

Speech to Text

Capabilities

Integrations

What is Munsit Speech to Text?

Key Features

Core Workflows

Available Models

Supported Language

Supported Audio Formats

Duration Limits

​What is Munsit Speech to Text?

​Key Features

​Core Workflows

​Available Models

​Supported Language

​Supported Audio Formats

​Duration Limits

What is Munsit Speech to Text?

Key Features

Core Workflows

Available Models

Supported Language

Supported Audio Formats

Duration Limits