Text-To-Speech WebSocket - Munsit Documentation

Messages

{
  "type": "<string>",
  "model_id": "<string>",
  "voice_id": "<string>",
  "voice_settings": {
    "stability": 123,
    "similarity_boost": 123,
    "speed": 123
  },
  "output_format": "<string>",
  "x_api_key": "<string>"
}

WSS

wss:

api.munsit.com

api

websocket

text-to-speech

The Text-to-Speech WebSocket API is designed to generate audio from partial text input while ensuring consistency throughout the generated audio. Although highly flexible, the WebSocket API isn’t a one-size-fits-all solution. It’s well-suited for scenarios where:

The input text is being streamed or generated in chunks.
Real-time audio generation is required with low latency.
You need to send text incrementally as it becomes available.

However, it may not be the best choice when:

The entire input text is available upfront. Given that the generations are partial, some buffering is involved, which could potentially result in slightly higher latency compared to a standard HTTP request.
You want to quickly experiment or prototype. Working with WebSockets can be harder and more complex than using a standard HTTP API, which might slow down rapid development and testing.

Endpoint

WSS /websocket/text-to-speech

Connection URL

wss://api.munsit.com/api/v1/websocket/text-to-speech?x-api-key=YOUR_API_KEY

Authentication

Requires API key authentication via x-api-key query parameter or in the initial connection message.

Query Parameters

Parameter	Type	Required	Description
`x-api-key`	string	Yes	Your Munsit API key

Message Types

Initialize Connection

After establishing the WebSocket connection, you must send an initialization message. Request:

{
  "type": "initConnection",
  "model_id": "faseeh-v1-preview",
  "voice_id": "ar-najdi-male-2",
  "voice_settings": {
    "stability": 0.5,
    "similarity_boost": 0.75,
    "speed": 1.0
  },
  "output_format": "pcm_24000",
  "x_api_key": "YOUR_API_KEY"
}

Request Fields:

Field	Type	Required	Description
`type`	string	Yes	Must be `"initConnection"`
`model_id`	string	No	Model ID to use (default: `"faseeh-mini-v1-preview"`)
`voice_id`	string	Yes	The voice ID to use for synthesis
`voice_settings`	object	No	Voice configuration
`voice_settings.stability`	number	No	Stability setting (default: 0.5)
`voice_settings.similarity_boost`	number	No	Similarity boost (default: 0.75)
`voice_settings.speed`	number	No	Speed setting, range 0.7-1.2 (default: 1.0)
`output_format`	string	No	Audio output format. Options: `"pcm_8000"`, `"pcm_16000"`, `"pcm_22050"`, `"pcm_24000"` (default: `"pcm_24000"`)
`x_api_key`	string	No	API key (if not provided in query parameter)

Response:

{
  "type": "connectionInitialized"
}

Send Text

Send text chunks for audio generation. Request:

{
  "type": "text",
  "text": "مرحبا بك في فصيح ",
  "flush": false,
  "try_trigger_generation": false
}

Request Fields:

Field	Type	Required	Description
`type`	string	Yes	Must be `"text"`
`text`	string	Yes	Text to convert to speech
`flush`	boolean	No	Force generation of audio even if buffer is small (default: `false`)
`try_trigger_generation`	boolean	No	Attempt to trigger generation immediately (default: `false`)

Response:

{
  "audio": "base64_encoded_audio_data",
  "sampleRate": 24000
}

Response Fields:

Field	Type	Description
`audio`	string	Base64-encoded PCM audio data
`sampleRate`	number	Sample rate of the audio (typically 24000 Hz)

Clear Buffer

Clear the current text buffer. Request:

{
  "type": "clear"
}

Response: No response message.

Close Connection

Close the WebSocket connection gracefully. Request:

{
  "type": "closeConnection"
}

Response: Connection closes.

Error Responses

If an error occurs, you’ll receive:

{
  "type": "error",
  "errorCode": 40101,
  "errorMessage": "Invalid API key"
}

Error Response Fields:

Field	Type	Description
`type`	string	Always `"error"`
`errorCode`	number	Numeric error code (e.g., 40101, 40001)
`errorMessage`	string	Human-readable error message

Example Usage

const ws = new WebSocket('wss://api.munsit.com/api/v1/websocket/text-to-speech?x-api-key=YOUR_API_KEY');

ws.onopen = () => {
  // Initialize connection
  ws.send(JSON.stringify({
    type: "initConnection",
    model_id: "faseeh-v1-preview",
    voice_id: "ar-najdi-male-2",
    voice_settings: {
      stability: 0.5,
      similarity_boost: 0.75,
      speed: 1.0
    },
    output_format: "pcm_24000"
  }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  
  if (data.type === "connectionInitialized") {
    // Connection ready, send text
    ws.send(JSON.stringify({
      type: "text",
      text: "مرحبا بك في فصيح "
    }));
  } else if (data.audio) {
    // Process audio chunk
    const audioData = atob(data.audio);
    // Handle audio playback
  } else if (data.type === "error" || data.errorCode) {
    console.error("Error:", data.errorMessage);
  }
};

ws.onerror = (error) => {
  console.error("WebSocket error:", error);
};

ws.onclose = () => {
  console.log("WebSocket closed");
};

Best Practices

Always initialize: Send initConnection immediately after opening the connection
Handle errors: Check for error messages in responses
Flush when done: Use flush: true when sending the last text chunk to ensure all audio is generated
Close gracefully: Send closeConnection before closing the WebSocket
Buffer audio: Collect audio chunks and play them sequentially for smooth playback

Messages

{
  "type": "<string>",
  "model_id": "<string>",
  "voice_id": "<string>",
  "voice_settings": {
    "stability": 123,
    "similarity_boost": 123,
    "speed": 123
  },
  "output_format": "<string>",
  "x_api_key": "<string>"
}

query

type:object

x-api-key

type:string

required

Your Munsit API key

Audio Response

type:object

Audio chunk response

Connection Initialized

type:object

Response confirming connection initialization

Error

type:object

Error response

Initialize Connection

type:object

Initialize the WebSocket connection

Send Text

type:object

Send text for audio generation

Text-to-Speech Stream

Text-to-Speech

​Endpoint

​Connection URL

​Authentication

​Query Parameters

​Message Types

​Initialize Connection

​Send Text

​Clear Buffer

​Close Connection

​Error Responses

​Example Usage

​Best Practices

Endpoint

Connection URL

Authentication

Query Parameters

Message Types

Initialize Connection

Send Text

Clear Buffer

Close Connection

Error Responses

Example Usage

Best Practices