Endpoint
WS /websocket/speech-to-text
Authentication
The server accepts either:| Field | Type | Required | Description |
|---|---|---|---|
x-api-key | string | No | API key in header or query |
Authorization | string | No | Bearer token header |
token | string | No | Token query param fallback for browsers |
Connection example
Supported audio input
- First chunk must be WAV (with headers)
- Subsequent chunks can be WAV or raw PCM
Client message format
Primary format (event + data):
audioBuffer must be an array of byte values (0-255).
Events to emit
audio_chunk
- Direction: client -> server
- Payload:
audioBufferasArray<Uint8> - Guidance:
- send approximately every ~1 second
- first chunk should include full WAV headers
- after first chunk, PCM chunks are accepted
Events to listen to
transcription
- Direction: server -> client
- Type:
string - Meaning: cumulative Arabic transcript generated from all received chunks
transcription_error
- Direction: server -> client
- Type:
string - Meaning: error details during streaming transcription
Connection/auth notes
connect: client-level WebSocket open event (successful handshake)authentication_error: not emitted as a dedicated WS event in current backend; invalid auth is handled by rejecting/closing the connection
Recommended flow
- Connect to
WS /websocket/speech-to-textwith auth - Confirm the socket is open on the client side
- Emit
audio_chunkpayloads (roughly every ~1s) - Listen for
transcription_errorand handle failures - Listen for
transcriptionand render live text updates - Safely disconnect when transcription is complete
