Features • Installation • Usage • Architecture • Development • License
SPECTRA is a local-first desktop application for audio signal intelligence. It captures audio from microphones or files, isolates human voices, enhances voice quality, transcribes speech, and performs speaker verification — all running on your machine with zero cloud dependencies.
Built with a sleek, dark interface inspired by signal-intelligence platforms, SPECTRA provides sinusoidal silk-thread waveform visualization, real-time spectrum analysis, automatic speaker diarization, and project-based workflow management.
- Audio Capture — Record from any system microphone or import 25+ audio formats (WAV, FLAC, OGG, MP3, OPUS, AAC, M4A, etc.)
- Signal Processing — FFT, STFT, spectral analysis, configurable Butterworth filters (bandpass, highpass, lowpass)
- Voice Separation — Harmonic/percussive source separation (HPSS) to isolate human voices from background audio
- Voice Enhancement — Spectral gating noise reduction, parametric EQ with voice-optimized presets, dynamic range compression
- Transcription — Local speech-to-text via faster-whisper (optional) or Ollama-powered analysis
- Voice Matching — MFCC-based speaker embeddings with cosine similarity scoring for voice concordance analysis
- Automatic Speaker Diarization — VAD → MFCC embeddings → agglomerative clustering to identify and label distinct speakers
- NSA Signal Intelligence Aesthetic — Dark theme with IBM Plex Sans typography, muted cool tones, rounded corners, generous negative space
- Sinusoidal Silk-Thread Waveform — Smooth Bézier-curve waveform with translucent glow, multi-track overlay per speaker (each with distinct colour)
- Visual Voice Comparison — Superposed silk threads with normalised cross-correlation concordance scoring, error margins, and colour-coded overlays
- Spectrum Analyzer — Logarithmic-frequency FFT bar display with peak hold decay and colour gradient
- Project Management — Each analysis session is a self-contained project with labelled audio segments and voice profiles
- Transport Controls — Record, play, pause, stop with position scrubbing and volume control
- SQLite Database — All project data stored locally with WAL mode for performance
- Voice Profiles — Create named speaker profiles with computed embeddings, compare any two for concordance
- Auto-Detected Speakers — Diarization automatically creates voice profiles per detected speaker
- Labelled Segments — Tag and organize audio extracts with custom names and labels
- Export — Export processed audio segments and transcription text
- Python 3.11+ (tested on 3.11, 3.12, 3.13)
- macOS or Linux (Windows support untested)
- PortAudio (for microphone access)
brew install portaudio libsndfilesudo apt-get install libportaudio2 libsndfile1# Clone the repository
git clone https://github.com/killmonger/spectra.git
cd spectra
# Install in development mode
pip install -e ".[dev]"
# Optional: Install whisper transcription backend
pip install -e ".[transcription]"SPECTRA can use Ollama for transcript analysis and summarization:
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a model
ollama pull llama3.2# Via command line
spectra
# Or directly
python -m spectra- Create a Project — Click
+ NEWin the project panel, name your analysis session - Import or Record Audio — Use
FILE > Import Audio…or click⏺ RECto record from microphone - Visualize — Waveform and spectrum displays update in real-time
- Process:
PROCESS > Separate Voice— Isolate human voice from backgroundPROCESS > Enhance Voice— Apply noise reduction and EQ optimizationPROCESS > Transcribe— Convert speech to text
- Voice Profiles — In the VOICES tab, create speaker profiles from audio selections
- Compare Voices — Select two profiles and click
⇌ COMPAREfor concordance scoring - Label & Organize — Name segments, add labels, build your analysis
SPECTRA reads configuration from spectra.json in the working directory:
{
"audio": {
"sample_rate": 44100,
"channels": 1,
"block_size": 1024,
"dtype": "float32"
},
"ollama": {
"url": "http://localhost:11434",
"model": "llama3.2",
"timeout": 30
},
"projects_dir": "projects",
"log_level": "INFO",
"db_path": "spectra.db"
}src/spectra/
├── app.py # Application entry point
├── core/ # Audio processing pipeline
│ ├── audio_capture.py # Mic/file input (sounddevice)
│ ├── audio_player.py # Audio playback engine (sounddevice)
│ ├── signal_processor.py # DSP: FFT, STFT, filtering (numpy/scipy)
│ ├── voice_separator.py # HPSS + VAD + speaker clustering
│ ├── voice_enhancer.py # Noise reduction, EQ, compression
│ ├── transcriber.py # Whisper + Ollama speech-to-text
│ ├── voice_matcher.py # MFCC speaker embeddings + cosine similarity
│ ├── speaker_diarizer.py # Auto speaker diarization (VAD → MFCC → clustering)
│ └── project_manager.py # Project lifecycle management
├── db/ # Persistence layer
│ ├── database.py # SQLite manager with full CRUD
│ ├── models.py # Dataclass entities
│ └── schema.py # DDL + migrations
├── ui/ # PySide6 interface
│ ├── main_window.py # Central window with flexible splitter layout
│ ├── waveform_widget.py # Sinusoidal silk-thread waveform with multi-track overlay
│ ├── comparison_widget.py # Visual voice comparison (superposed silk threads)
│ ├── spectrum_widget.py # FFT spectrum analyzer
│ ├── transport_bar.py # Playback/recording controls
│ ├── project_panel.py # Project management sidebar
│ ├── voice_panel.py # Voice profiles & matching
│ ├── transcript_panel.py # Transcription viewer
│ └── theme.py # NSA signal-intelligence theme (IBM Plex Sans)
└── utils/ # Shared utilities
├── audio_utils.py # Format conversion, resampling (25+ formats)
├── config.py # Configuration management
└── logger.py # Structured logging
┌──────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Audio Capture │───▸│ Signal Processor │───▸│ Voice Separator │
│ (mic/file) │ │ (FFT, filter) │ │ (HPSS + VAD) │
└──────────────┘ └──────────────────┘ └────────┬────────┘
│
┌──────────────────┐ │
│ Transcriber │◂─────────────┤
│ (whisper/ollama) │ │
└──────────────────┘ ┌────────▾────────┐
│ Voice Enhancer │
┌──────────────────┐ │ (denoise + EQ) │
│ Voice Matcher │◂──└─────────────────┘
│ (MFCC + cosine) │
└────────┬─────────┘
│
┌────────▾─────────┐
│Speaker Diarizer │
│(VAD→MFCC→cluster)│
└──────────────────┘
| Table | Description |
|---|---|
projects |
Analysis sessions with metadata |
audio_segments |
Individual audio clips within a project |
voice_profiles |
Named speaker embeddings (MFCC vectors) |
transcriptions |
Speech-to-text results per segment |
match_results |
Voice concordance scores between profiles |
| Component | Technology | Purpose |
|---|---|---|
| UI Framework | PySide6 (Qt 6) | Desktop application with custom widgets |
| DSP Engine | NumPy + SciPy | FFT, filtering, spectral analysis, clustering |
| Audio I/O | sounddevice + soundfile | Mic capture, file loading/saving (25+ formats) |
| Audio Convert | FFmpeg (subprocess) | Transcode OPUS, AAC, M4A, etc. to WAV |
| Transcription | faster-whisper / Ollama | Local speech-to-text |
| Speaker Diarization | scipy.cluster.hierarchy | Automatic multi-speaker identification |
| Database | SQLite3 (stdlib) | Project data persistence |
| Testing | pytest + pytest-cov | 96%+ coverage |
| Linting | ruff | Fast Python linter |
| Type Checking | mypy | Static type analysis |
| CI/CD | GitHub Actions | Automated lint, test, build |
git clone https://github.com/killmonger/spectra.git
cd spectra
pip install -e ".[dev]"# Run the app
make run
# Run tests
make test
# Run tests with coverage report
make test-cov
# Run fast tests only (skip slow/integration)
make test-fast
# Lint code
make lint
# Auto-format code
make format
# Type check
make typecheck
# Clean build artifacts
make cleanThe test suite targets 95% code coverage using pytest:
# Full test suite with coverage
pytest tests/ -v --cov=spectra --cov-report=term-missing
# Run specific test modules
pytest tests/core/test_signal_processor.py -v
pytest tests/db/ -v
pytest tests/ui/ -v
# Skip tests requiring hardware
pytest tests/ -m "not integration"GitHub Actions pipelines run automatically on push/PR:
- Lint — ruff format check + lint rules
- Type Check — mypy strict mode
- Test — pytest across Python 3.11/3.12/3.13 on Ubuntu + macOS
- Build — Package verification (sdist + wheel)
- All modules use
from __future__ import annotationsfor forward references - Type hints on all public functions
- Logging via
logging.getLogger(__name__) - Database operations use parameterized queries (no SQL injection)
- Audio processing uses pure NumPy/SciPy (no heavy ML frameworks required)
MIT License — see LICENSE for details.
SPECTRA — Local-first audio intelligence.
No cloud. No telemetry. Your audio stays on your machine.