Suno at home. Local AI music generation studio — songs, vocals, lyrics, covers, music videos.
Create full songs with vocals, lyrics, covers, remixes and music videos — 100% local, no cloud, no subscriptions, no internet required. One-click install on Windows, runs on any NVIDIA GPU with 12+ GB VRAM.
Built on ACE-Step 1.5 XL — the open-source 4B parameter DiT music generation model.
- Free forever — no API keys, no credits, no usage limits
- Private — your music never leaves your machine
- Portable — everything in one folder, copy to USB, delete = uninstall
- One-click —
install.bat→run.bat→ make music
- Full songs with vocals — up to 8 minutes, any language, any genre
- Simple & Custom modes — describe what you want or fine-tune every parameter
- 3 XL Models — XL Turbo (8 steps, fast), XL SFT (50 steps, max quality), XL Turbo BF16 (compact, 7.5 GB)
- AI Lyrics & Style — LLM generates lyrics and enriches style descriptions
- Hot Model Switching — change DiT/LM models without restart
- Batch generation — create multiple variations at once
- 10 samplers, 7 schedulers — euler, heun, midpoint, dopri5, deis, ipndm, and more
- LoRA support — load LoRA weights at inference time
- ID3 tags — MP3 files include title, artist, cover art, lyrics, BPM
- Cover mode — transform existing audio into a new style while keeping the melody
- Repaint mode — regenerate specific sections of a song (region selection on waveform)
- Reference audio — use a reference track to guide the generation style
- Audio strength control — blend between source and generated audio
- Music video generator — NCS-style visualizers with 10 presets
- Karaoke lyrics — synchronized LRC subtitles with 3 styles (lines, scroll, karaoke fill)
- WYSIWYG editor — drag elements, scroll to resize, selection frames
- Aspect ratios — 16:9, 9:16 (Reels/TikTok), 1:1 (Instagram)
- 12 effects — shake, glitch, VHS, CCTV, scanlines, bloom, film grain, strobe, vignette, hue shift, letterbox, pixelate
- Background — random, custom image, Pexels search, video backgrounds
- Server-side rendering — native ffmpeg with NVENC GPU acceleration
- Audio editor — trim, fade, effects (AudioMass)
- Stem extraction — separate vocals, drums, bass, other (Demucs)
- LRC download — export synchronized lyrics
- BF16 Converter — convert safetensors from FP32/FP16 to BFloat16 (~50% size reduction)
- Model Merger — merge two ACE-Step models with adjustable alpha (3 methods)
- Bake LoRA — bake LoRA weights into base model
- Single terminal — one
run.bat, Express manages Python/Gradio automatically - Portable — everything in one folder, no system-wide installs
- 5 languages — English, Russian, Chinese, Japanese, Korean
- LAN access — use from any device on your network (phone, tablet)
- GPU monitoring — live VRAM, RAM, CPU, temperature stats
- Dark/Light theme
| Component | Minimum | Recommended |
|---|---|---|
| GPU VRAM | 12 GB | 20+ GB |
| RAM | 16 GB | 32 GB |
| Disk | 30 GB | 60 GB (all models) |
| OS | Windows 10/11 | Windows 11 |
| GPU | RTX 3060+ | RTX 4090 |
git clone https://github.com/timoncool/ACE-Step-Studio.git
cd ACE-Step-Studioinstall.bat
Select your GPU type (CUDA 12.8 / 12.6 / 12.4). Installs portable Python 3.12, PyTorch, Node.js 22, and all dependencies — nothing system-wide.
run.bat
Browser opens automatically at http://localhost:3001. Models download on first run (~7.5 GB for default BF16 model).
| Script | Description |
|---|---|
run.bat |
Standard launch — DiT + LM (0.6B PT), full features |
run-no-lm.bat |
Launch without LM — more VRAM for DiT, cover/repaint work, no AI lyrics/thinking |
run-dev.bat |
Dev mode — 3 terminals with Vite HMR |
install.bat |
One-click installer |
update.bat |
Update code + deps + rebuild frontend |
reinstall.bat |
Clean reinstall (preserves models and data) |
download_model.bat |
Pre-download models |
| Model | Size | Steps | Speed | Quality |
|---|---|---|---|---|
| XL Turbo BF16 | 7.5 GB | 8 | Fast | High |
| XL Turbo | 18.8 GB | 8 | Fast | Very High |
| XL SFT | 18.8 GB | 50 | Slow | Highest |
| XL Merge SFT+Turbo | 18.8 GB | 12 | Medium | Very High |
| Model | VRAM | Quality |
|---|---|---|
| 0.6B | ~0.5 GB | Basic |
| 1.7B | ~1.5 GB | Good |
| 4B | ~4 GB | Best |
LM backend: PT (PyTorch, lighter) or vLLM (faster inference, more VRAM).
ACE-Step-Studio/
├── app/ # React + Express frontend & backend
├── ACE-Step-1.5/ # Python ML pipeline
├── python/ # Portable Python 3.12 (created by install.bat)
├── node/ # Portable Node.js 22 (created by install.bat)
├── models/ # HuggingFace cache (created at runtime)
├── run.bat # Standard launcher
├── run-no-lm.bat # Launch without LM
├── install.bat # One-click installer
├── update.bat # Updater
└── CHANGELOG.md # Version history
update.bat
Pulls latest code, updates Python/Node deps, rebuilds frontend.
Contributions welcome! Here's how to help:
- Report bugs — open an issue
- Suggest features — start a discussion
- Submit PRs — see AGENTS.md for architecture, coding conventions, and pitfalls
Areas where help is especially needed:
- macOS / Linux support
- New visualizer presets for Video Studio
- Translations (i18n)
- LoRA training UI improvements
- Documentation & tutorials
| Project | Description |
|---|---|
| Foundation Music Lab | Music generation + timeline editor |
| VibeVoice ASR | Speech recognition (ASR) |
| LavaSR | Audio quality enhancement |
| Qwen3-TTS | Text-to-speech by Qwen |
| SuperCaption Qwen3-VL | Image captioning |
| VideoSOS | AI video production |
| RC Stable Audio Tools | Music and audio generation |
- Nerual Dreming — Telegram | neuro-cartel.com | ArtGeneration.me
- Neiro-Soft — Telegram | portable neural network builds
- ACE-Step Team — open source ACE-Step 1.5 music generation model
- fspecii — original ACE-Step UI
- AudioMass — browser audio editor
- Demucs — stem extraction by Meta
- Pexels — free stock photos/videos
- Gradio — ML model serving
- FFmpeg — video encoding
I build software and do research in AI and music generation. Most of what I create is free and open source. Your donations allow me to keep creating and exploring without worrying about where the next meal comes from =)
All donation methods | dalink.to/nerual_dreming | boosty.to/neuro_art
- BTC:
1E7dHL22RpyhJGVpcvKdbyZgksSYkYeEBC - ETH (ERC20):
0xb5db65adf478983186d4897ba92fe2c25c594a0c - USDT (TRC20):
TQST9Lp2TjK6FiVkn4fwfGUee7NmkxEE7C

