Skip to main content
Munsit’s Voice Isolation technology separates clean speech from background noise, improving audio quality for transcription and voice cloning applications.

Overview

Voice Isolation helps you:
  • Remove background noise from audio recordings
  • Enhance speech clarity for better transcription
  • Prepare clean audio for voice cloning
  • Improve audio quality for downstream processing

Features

Advanced Noise Reduction

AI-powered noise removal while preserving speech quality

Async Job Processing

Upload once and stream progress events while the job runs

Audio & Video Support

Accepts both audio files and video files (audio track is extracted)

High Fidelity

Maintains natural speech characteristics

Use Cases

  • Pre-processing for Transcription: Clean audio before sending to speech-to-text
  • Voice Cloning Preparation: Isolate clean speech for better voice cloning results
  • Podcast Production: Remove background noise from podcast recordings
  • Call Quality Enhancement: Improve audio quality in telephony applications

How It Works

Voice Isolation is an asynchronous job-based pipeline:
  1. Submit the filePOST /denoise with a multipart/form-data body containing the audio field. The response returns a jobId and a denoiseId.
  2. Track progress — open an SSE connection to GET /denoise/{denoiseId}/progress to receive processing, done, and error events in real time.
  3. Retrieve the result — when the done event fires it carries the url of the denoised audio. You can also list past jobs and their final URLs via GET /denoise.

Limits

  • Maximum file size: 200 MB
  • Maximum duration: 15 minutes
  • Accepted source types: audio files and common video containers (mp4, mov, mkv, webm, avi, m4v). Video uploads have their audio track extracted automatically.

API Reference