Skip to main content
Munsit provides speaker diarization with transcription merged through a simple file-upload workflow.
It automatically identifies and labels different speakers in your Arabic audio recordings, then aligns each speaker segment with transcribed text and timestamps.
This helps you clearly understand who said what in meetings, interviews, podcasts, and multi-speaker conversations.

How it works

  1. Upload audio: Send a supported multi-speaker audio file.
  2. Speaker detection: Munsit identifies speaker turns and assigns speaker labels.
  3. Merged output: You receive transcription, diarization segments, and merged speaker-labeled text with timing.

Sample response

{
  "statusCode": 201,
  "data": {
    "transcription": {
      "transcription": "السلام عليكم ورحمة الله وبركاته. كيف حالك اليوم؟",
      "timestamps": [
        { "word": "السلام", "start": 0.0, "end": 1.2 }
      ]
    },
    "diarization": {
      "segments": [
        { "start": 0.0, "end": 8.5, "speaker": "SPEAKER_00" }
      ]
    },
    "merged": [
      {
        "start": 0.0,
        "end": 8.5,
        "speaker": "SPEAKER_00",
        "text": "السلام عليكم ورحمة الله وبركاته"
      }
    ],
    "duration": 53.661375
  },
  "message": "Success"
}