Skip to main content
WSS
wss:
/
api.munsit.ai
/
api
/
v1
/
websocket
/
speech-to-text
Messages
headers
type:object
x-api-key
type:string

API key header

Authorization
type:string

Bearer token header

query
type:object
x-api-key
type:string

Your Munsit API key

token
type:string

Optional auth token for browser WebSocket fallback

Transcription
type:object

Transcription text event from server

Transcription Error
type:object

Error event from server

Audio Chunk

Send a chunk of audio bytes for transcription

Endpoint

WS /websocket/speech-to-text

Authentication

The server accepts either:
FieldTypeRequiredDescription
x-api-keystringNoAPI key in header or query
AuthorizationstringNoBearer token header
tokenstringNoToken query param fallback for browsers
At least one auth method is required. If auth is invalid, the connection is rejected or closed.

Connection example

wss://api.munsit.ai/api/v1/websocket/speech-to-text?x-api-key=YOUR_MUNSIT_API_KEY

Supported audio input

  • First chunk must be WAV (with headers)
  • Subsequent chunks can be WAV or raw PCM

Client message format

Primary format (event + data):
{
  "event": "audio_chunk",
  "data": {
    "audioBuffer": [1, 2, 3]
  }
}
Compatibility format also accepted:
{
  "audioBuffer": [1, 2, 3]
}
audioBuffer must be an array of byte values (0-255).

Events to emit

audio_chunk

  • Direction: client -> server
  • Payload: audioBuffer as Array<Uint8>
  • Guidance:
    • send approximately every ~1 second
    • first chunk should include full WAV headers
    • after first chunk, PCM chunks are accepted

Events to listen to

transcription

  • Direction: server -> client
  • Type: string
  • Meaning: cumulative Arabic transcript generated from all received chunks

transcription_error

  • Direction: server -> client
  • Type: string
  • Meaning: error details during streaming transcription

Connection/auth notes

  • connect: client-level WebSocket open event (successful handshake)
  • authentication_error: not emitted as a dedicated WS event in current backend; invalid auth is handled by rejecting/closing the connection
  1. Connect to WS /websocket/speech-to-text with auth
  2. Confirm the socket is open on the client side
  3. Emit audio_chunk payloads (roughly every ~1s)
  4. Listen for transcription_error and handle failures
  5. Listen for transcription and render live text updates
  6. Safely disconnect when transcription is complete