Skip to main content
WSS
wss:
/
api.munsit.ai
/
api
/
v1
/
websocket
/
text-to-speech
Messages
query
type:object
x-api-key
type:string

Your Munsit API key

Audio Response
type:object

Audio chunk response

Connection Initialized
type:object

Response confirming connection initialization

Error
type:object

Error response

Initialize Connection
type:object

Initialize the WebSocket connection

Send Text
type:object

Send text for audio generation

The Text-to-Speech WebSocket API is designed to generate audio from partial text input while ensuring consistency throughout the generated audio. Although highly flexible, the WebSocket API isn’t a one-size-fits-all solution. It’s well-suited for scenarios where:
  • The input text is being streamed or generated in chunks.
  • Real-time audio generation is required with low latency.
  • You need to send text incrementally as it becomes available.
However, it may not be the best choice when:
  • The entire input text is available upfront. Given that the generations are partial, some buffering is involved, which could potentially result in slightly higher latency compared to a standard HTTP request.
  • You want to quickly experiment or prototype. Working with WebSockets can be harder and more complex than using a standard HTTP API, which might slow down rapid development and testing.

Endpoint

WSS /websocket/text-to-speech

Connection URL

wss://api.munsit.ai/api/v1/websocket/text-to-speech?x-api-key=YOUR_API_KEY

Authentication

Requires API key authentication via x-api-key query parameter or in the initial connection message.

Query Parameters

ParameterTypeRequiredDescription
x-api-keystringYesYour Munsit API key

Message Types

Initialize Connection

After establishing the WebSocket connection, you must send an initialization message. Request:
{
  "type": "initConnection",
  "model_id": "faseeh-v1-preview",
  "voice_id": "ar-najdi-male-2",
  "voice_settings": {
    "stability": 0.5,
    "similarity_boost": 0.75,
    "speed": 1.0
  },
  "output_format": "pcm_24000",
  "x_api_key": "YOUR_API_KEY"
}
Request Fields:
FieldTypeRequiredDescription
typestringYesMust be "initConnection"
model_idstringNoModel ID to use (default: "faseeh-mini-v1-preview")
voice_idstringYesThe voice ID to use for synthesis
voice_settingsobjectNoVoice configuration
voice_settings.stabilitynumberNoStability setting (default: 0.5)
voice_settings.similarity_boostnumberNoSimilarity boost (default: 0.75)
voice_settings.speednumberNoSpeed setting, range 0.7-1.2 (default: 1.0)
output_formatstringNoAudio output format. Options: "pcm_8000", "pcm_16000", "pcm_22050", "pcm_24000" (default: "pcm_24000")
x_api_keystringNoAPI key (if not provided in query parameter)
Response:
{
  "type": "connectionInitialized"
}

Send Text

Send text chunks for audio generation. Request:
{
  "type": "text",
  "text": "مرحبا بك في فصيح ",
  "flush": false,
  "try_trigger_generation": false
}
Request Fields:
FieldTypeRequiredDescription
typestringYesMust be "text"
textstringYesText to convert to speech
flushbooleanNoForce generation of audio even if buffer is small (default: false)
try_trigger_generationbooleanNoAttempt to trigger generation immediately (default: false)
Response:
{
  "audio": "base64_encoded_audio_data",
  "sampleRate": 24000
}
Response Fields:
FieldTypeDescription
audiostringBase64-encoded PCM audio data
sampleRatenumberSample rate of the audio (typically 24000 Hz)

Clear Buffer

Clear the current text buffer. Request:
{
  "type": "clear"
}
Response: No response message.

Close Connection

Close the WebSocket connection gracefully. Request:
{
  "type": "closeConnection"
}
Response: Connection closes.

Error Responses

If an error occurs, you’ll receive:
{
  "type": "error",
  "errorCode": 40101,
  "errorMessage": "Invalid API key"
}
Error Response Fields:
FieldTypeDescription
typestringAlways "error"
errorCodenumberNumeric error code (e.g., 40101, 40001)
errorMessagestringHuman-readable error message

Example Usage

const ws = new WebSocket('wss://api.munsit.ai/api/v1/websocket/text-to-speech?x-api-key=YOUR_API_KEY');

ws.onopen = () => {
  // Initialize connection
  ws.send(JSON.stringify({
    type: "initConnection",
    model_id: "faseeh-v1-preview",
    voice_id: "ar-najdi-male-2",
    voice_settings: {
      stability: 0.5,
      similarity_boost: 0.75,
      speed: 1.0
    },
    output_format: "pcm_24000"
  }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  
  if (data.type === "connectionInitialized") {
    // Connection ready, send text
    ws.send(JSON.stringify({
      type: "text",
      text: "مرحبا بك في فصيح "
    }));
  } else if (data.audio) {
    // Process audio chunk
    const audioData = atob(data.audio);
    // Handle audio playback
  } else if (data.type === "error" || data.errorCode) {
    console.error("Error:", data.errorMessage);
  }
};

ws.onerror = (error) => {
  console.error("WebSocket error:", error);
};

ws.onclose = () => {
  console.log("WebSocket closed");
};

Best Practices

  1. Always initialize: Send initConnection immediately after opening the connection
  2. Handle errors: Check for error messages in responses
  3. Flush when done: Use flush: true when sending the last text chunk to ensure all audio is generated
  4. Close gracefully: Send closeConnection before closing the WebSocket
  5. Buffer audio: Collect audio chunks and play them sequentially for smooth playback