Mimic

A web-based voice cloning application powered by Qwen3-TTS. Generate natural-sounding speech in any voice using just a short audio sample.

Screenshots

First-Run Setup — The app guides you through downloading required models on first launch.

Voice Profiles — Create and manage voice profiles from reference audio samples.

Generate Speech — Generate speech with your cloned voice and preview with word-level timestamps.

Audio Library & Studio — Browse generated clips and combine them into sequences with timeline editing.

Features

Voice Cloning

Zero-shot voice cloning - Clone any voice with just 5-15 seconds of reference audio
Multiple model support - Choose between 0.6B and 1.7B parameter models
- Qwen/Qwen3-TTS-12Hz-1.7B-Base - Higher quality, more VRAM required
- Qwen/Qwen3-TTS-12Hz-0.6B-Base - Faster, lower resource usage
Voice profiles - Save reference audio + transcript for quick reuse
Word-level timestamps - Automatic alignment for precise editing

Audio Studio

Timeline editor - Combine multiple clips into sequences with drag-and-drop
Word-level editing - Select and delete words directly from the transcript
Waveform trimming - Visual trim controls with real-time preview
Gap controls - Add silence between clips
Undo/redo - Full edit history support
Seamless preview - Server-side audio combining for gapless playback
Export - Download as WAV or MP3
Project management - Save and load studio projects

Transcription

Cross-platform Whisper - Auto-detects and uses the best backend for your hardware:
- Apple Silicon: mlx-whisper with mlx-community/whisper-large-v3-turbo
- NVIDIA GPU / CPU: faster-whisper with Systran/faster-whisper-large-v3-turbo
Word timestamps - Precise timing for each word

Installation

Prerequisites

Python 3.10-3.12 (3.13+ not yet supported due to onnxruntime dependency)
FFmpeg (for audio processing)
CUDA-compatible GPU recommended (CPU works but slower)

Setup

Clone the repository:

git clone https://github.com/transcriptionstream/mimic.git
cd mimic

Create a virtual environment (use Python 3.10-3.12):

python3.12 -m venv venv  # or python3.11, python3.10
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies (automatically installs the correct Whisper backend for your platform):

pip install -r requirements.txt

Start the server:

python app.py

Open http://localhost:8000 in your browser
On first launch, the app will guide you through downloading the required models:
- TTS Model (~3.4 GB) - For voice synthesis
- Whisper Model (~1.5 GB) - For transcription

Usage

Basic Voice Cloning

Upload reference audio - Record or upload 5-15 seconds of the voice you want to clone
Add transcript - Type what was said in the reference audio (or use auto-transcribe)
Enter target text - Type what you want the cloned voice to say
Generate - Click generate and wait for the audio

Creating Voice Profiles

After uploading reference audio and transcript, click "Save as Profile"
Give your profile a name
Select the profile from the dropdown for future generations

Audio Studio

Click "Audio Studio" to open the editor
Add clips from your generation history to the timeline
Drag to reorder, click to select and trim
Use word-level editing to remove unwanted words
Preview your sequence, then export

Tech Stack

Backend: FastAPI, Python
Frontend: Vanilla JavaScript, CSS
TTS Engine: Qwen3-TTS
Transcription: Whisper Large V3 Turbo via mlx-whisper or faster-whisper
Audio Processing: soundfile, numpy, pydub

Project Structure

mimic/
├── app.py              # FastAPI application
├── mimic/
│   ├── models.py       # Model download management
│   ├── tts.py          # Qwen3-TTS integration
│   ├── transcribe.py   # Whisper transcription
│   ├── profiles.py     # Voice profile management
│   ├── history.py      # Generation history
│   └── studio.py       # Audio studio backend
├── static/
│   ├── index.html      # Main UI
│   ├── styles.css      # Styling
│   └── js/
│       ├── main.js     # Entry point
│       ├── app.js      # Main application class
│       ├── studio.js   # Audio studio module
│       └── ui.js       # UI utilities
└── data/
    ├── models/         # Downloaded TTS models
    ├── profiles/       # Saved voice profiles
    ├── uploads/        # Uploaded audio files
    └── history/        # Generated audio history

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data/profiles		data/profiles
docs/images		docs/images
mimic		mimic
static		static
.gitignore		.gitignore
AUDIO_ENHANCEMENT_IDEAS.md		AUDIO_ENHANCEMENT_IDEAS.md
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mimic

Screenshots

Features

Voice Cloning

Audio Studio

Transcription

Installation

Prerequisites

Setup

Usage

Basic Voice Cloning

Creating Voice Profiles

Audio Studio

Tech Stack

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mimic

Screenshots

Features

Voice Cloning

Audio Studio

Transcription

Installation

Prerequisites

Setup

Usage

Basic Voice Cloning

Creating Voice Profiles

Audio Studio

Tech Stack

Project Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages