- The input text is being streamed or generated in chunks.
- Real-time audio generation is required with low latency.
- You need to send text incrementally as it becomes available.
- The entire input text is available upfront. Given that the generations are partial, some buffering is involved, which could potentially result in slightly higher latency compared to a standard HTTP request.
- You want to quickly experiment or prototype. Working with WebSockets can be harder and more complex than using a standard HTTP API, which might slow down rapid development and testing.
Endpoint
Connection URL
Authentication
Requires API key authentication viax-api-key query parameter or in the initial connection message.
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
x-api-key | string | Yes | Your Munsit API key |
Message Types
Initialize Connection
After establishing the WebSocket connection, you must send an initialization message. Request:| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Must be "initConnection" |
model_id | string | No | Model ID to use (default: "faseeh-mini-v1-preview") |
voice_id | string | Yes | The voice ID to use for synthesis |
voice_settings | object | No | Voice configuration |
voice_settings.stability | number | No | Stability setting (default: 0.5) |
voice_settings.similarity_boost | number | No | Similarity boost (default: 0.75) |
voice_settings.speed | number | No | Speed setting, range 0.7-1.2 (default: 1.0) |
output_format | string | No | Audio output format. Options: "pcm_8000", "pcm_16000", "pcm_22050", "pcm_24000" (default: "pcm_24000") |
x_api_key | string | No | API key (if not provided in query parameter) |
Send Text
Send text chunks for audio generation. Request:| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Must be "text" |
text | string | Yes | Text to convert to speech |
flush | boolean | No | Force generation of audio even if buffer is small (default: false) |
try_trigger_generation | boolean | No | Attempt to trigger generation immediately (default: false) |
| Field | Type | Description |
|---|---|---|
audio | string | Base64-encoded PCM audio data |
sampleRate | number | Sample rate of the audio (typically 24000 Hz) |
Clear Buffer
Clear the current text buffer. Request:Close Connection
Close the WebSocket connection gracefully. Request:Error Responses
If an error occurs, you’ll receive:| Field | Type | Description |
|---|---|---|
type | string | Always "error" |
errorCode | number | Numeric error code (e.g., 40101, 40001) |
errorMessage | string | Human-readable error message |
Example Usage
Best Practices
- Always initialize: Send
initConnectionimmediately after opening the connection - Handle errors: Check for error messages in responses
- Flush when done: Use
flush: truewhen sending the last text chunk to ensure all audio is generated - Close gracefully: Send
closeConnectionbefore closing the WebSocket - Buffer audio: Collect audio chunks and play them sequentially for smooth playback
