Diarization - Munsit Documentation

Munsit provides speaker diarization with transcription merged through a simple file-upload workflow.
It automatically identifies and labels different speakers in your Arabic audio recordings, then aligns each speaker segment with transcribed text and timestamps.
This helps you clearly understand who said what in meetings, interviews, podcasts, and multi-speaker conversations.

How it works

Upload audio: Send a supported multi-speaker audio file.
Speaker detection: Munsit identifies speaker turns and assigns speaker labels.
Merged output: You receive transcription, diarization segments, and merged speaker-labeled text with timing.

Sample response

{
  "statusCode": 201,
  "data": {
    "transcription": {
      "transcription": "السلام عليكم ورحمة الله وبركاته. كيف حالك اليوم؟",
      "timestamps": [
        { "word": "السلام", "start": 0.0, "end": 1.2 }
      ]
    },
    "diarization": {
      "segments": [
        { "start": 0.0, "end": 8.5, "speaker": "SPEAKER_00" }
      ]
    },
    "merged": [
      {
        "start": 0.0,
        "end": 8.5,
        "speaker": "SPEAKER_00",
        "text": "السلام عليكم ورحمة الله وبركاته"
      }
    ],
    "duration": 53.661375
  },
  "message": "Success"
}

Diarization API Reference

​How it works

​Sample response

​Related API

How it works

Sample response

Related API