Speech-to-Text Streaming - Munsit Documentation

Messages

WSS

wss:

api.munsit.com

api

websocket

speech-to-text

Endpoint

WS /websocket/speech-to-text

Authentication

The server accepts either:

Field	Type	Required	Description
`x-api-key`	string	No	API key in header or query
`Authorization`	string	No	Bearer token header
`token`	string	No	Token query param fallback for browsers

At least one auth method is required. If auth is invalid, the connection is rejected or closed.

Query Parameters

Field	Type	Required	Description
`model`	string	No	ASR model to use: `munsit` (default) or `munsit-en-ar` (mixed Arabic-English with code-switching)

Connection example

wss://api.munsit.com/api/v1/websocket/speech-to-text?x-api-key=YOUR_MUNSIT_API_KEY&model=munsit

Supported audio input

First chunk must be WAV (with headers)
Subsequent chunks can be WAV or raw PCM

Client message format

Primary format (event + data):

{
  "event": "audio_chunk",
  "data": {
    "audioBuffer": [1, 2, 3]
  }
}

Compatibility format also accepted:

{
  "audioBuffer": [1, 2, 3]
}

audioBuffer must be an array of byte values (0-255).

Events to emit

`audio_chunk`

Direction: client -> server
Payload: audioBuffer as Array<Uint8>
Guidance:
- send approximately every ~1 second
- first chunk should include full WAV headers
- after first chunk, PCM chunks are accepted

Events to listen to

`transcription`

Direction: server -> client
Type: string
Meaning: cumulative Arabic transcript generated from all received chunks

`transcription_error`

Direction: server -> client
Type: string
Meaning: error details during streaming transcription

Connection/auth notes

connect: client-level WebSocket open event (successful handshake)
authentication_error: not emitted as a dedicated WS event in current backend; invalid auth is handled by rejecting/closing the connection

Recommended flow

Connect to WS /websocket/speech-to-text with auth
Confirm the socket is open on the client side
Emit audio_chunk payloads (roughly every ~1s)
Listen for transcription_error and handle failures
Listen for transcription and render live text updates
Safely disconnect when transcription is complete

Messages

headers

type:object

x-api-key

type:string

API key header

Authorization

type:string

Bearer token header

query

type:object

x-api-key

type:string

Your Munsit API key

token

type:string

Optional auth token for browser WebSocket fallback

Transcription

type:object

Transcription text event from server

Transcription Error

type:object

Error event from server

Audio Chunk

Send a chunk of audio bytes for transcription

Transcriber

Minutes of Meeting Transcriber

​Endpoint

​Authentication

​Query Parameters

​Connection example

​Supported audio input

​Client message format

​Events to emit

​audio_chunk

​Events to listen to

​transcription

​transcription_error

​Connection/auth notes

​Recommended flow

Endpoint

Authentication

Query Parameters

Connection example

Supported audio input

Client message format

Events to emit

`audio_chunk`

Events to listen to

`transcription`

`transcription_error`

Connection/auth notes

Recommended flow