GitHub - MiltonJ23/Spectra: A local first application for audio signal intelligence

Features • Installation • Usage • Architecture • Development • License

Overview

SPECTRA is a local-first desktop application for audio signal intelligence. It captures audio from microphones or files, isolates human voices, enhances voice quality, transcribes speech, and performs speaker verification — all running on your machine with zero cloud dependencies.

Built with a sleek, dark interface inspired by signal-intelligence platforms, SPECTRA provides sinusoidal silk-thread waveform visualization, real-time spectrum analysis, automatic speaker diarization, and project-based workflow management.

Features

Core Audio Processing

Audio Capture — Record from any system microphone or import 25+ audio formats (WAV, FLAC, OGG, MP3, OPUS, AAC, M4A, etc.)
Signal Processing — FFT, STFT, spectral analysis, configurable Butterworth filters (bandpass, highpass, lowpass)
Voice Separation — Harmonic/percussive source separation (HPSS) to isolate human voices from background audio
Voice Enhancement — Spectral gating noise reduction, parametric EQ with voice-optimized presets, dynamic range compression
Transcription — Local speech-to-text via faster-whisper (optional) or Ollama-powered analysis
Voice Matching — MFCC-based speaker embeddings with cosine similarity scoring for voice concordance analysis
Automatic Speaker Diarization — VAD → MFCC embeddings → agglomerative clustering to identify and label distinct speakers

User Interface

NSA Signal Intelligence Aesthetic — Dark theme with IBM Plex Sans typography, muted cool tones, rounded corners, generous negative space
Sinusoidal Silk-Thread Waveform — Smooth Bézier-curve waveform with translucent glow, multi-track overlay per speaker (each with distinct colour)
Visual Voice Comparison — Superposed silk threads with normalised cross-correlation concordance scoring, error margins, and colour-coded overlays
Spectrum Analyzer — Logarithmic-frequency FFT bar display with peak hold decay and colour gradient
Project Management — Each analysis session is a self-contained project with labelled audio segments and voice profiles
Transport Controls — Record, play, pause, stop with position scrubbing and volume control

Data Management

SQLite Database — All project data stored locally with WAL mode for performance
Voice Profiles — Create named speaker profiles with computed embeddings, compare any two for concordance
Auto-Detected Speakers — Diarization automatically creates voice profiles per detected speaker
Labelled Segments — Tag and organize audio extracts with custom names and labels
Export — Export processed audio segments and transcription text

Installation

Prerequisites

Python 3.11+ (tested on 3.11, 3.12, 3.13)
macOS or Linux (Windows support untested)
PortAudio (for microphone access)

macOS

brew install portaudio libsndfile

Ubuntu/Debian

sudo apt-get install libportaudio2 libsndfile1

Install SPECTRA

# Clone the repository
git clone https://github.com/killmonger/spectra.git
cd spectra

# Install in development mode
pip install -e ".[dev]"

# Optional: Install whisper transcription backend
pip install -e ".[transcription]"

Ollama Integration (Optional)

SPECTRA can use Ollama for transcript analysis and summarization:

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama3.2

Usage

Launch the Application

# Via command line
spectra

# Or directly
python -m spectra

Workflow

Create a Project — Click + NEW in the project panel, name your analysis session
Import or Record Audio — Use FILE > Import Audio… or click ⏺ REC to record from microphone
Visualize — Waveform and spectrum displays update in real-time
Process:
- PROCESS > Separate Voice — Isolate human voice from background
- PROCESS > Enhance Voice — Apply noise reduction and EQ optimization
- PROCESS > Transcribe — Convert speech to text
Voice Profiles — In the VOICES tab, create speaker profiles from audio selections
Compare Voices — Select two profiles and click ⇌ COMPARE for concordance scoring
Label & Organize — Name segments, add labels, build your analysis

Configuration

SPECTRA reads configuration from spectra.json in the working directory:

{
  "audio": {
    "sample_rate": 44100,
    "channels": 1,
    "block_size": 1024,
    "dtype": "float32"
  },
  "ollama": {
    "url": "http://localhost:11434",
    "model": "llama3.2",
    "timeout": 30
  },
  "projects_dir": "projects",
  "log_level": "INFO",
  "db_path": "spectra.db"
}

Architecture

src/spectra/
├── app.py                     # Application entry point
├── core/                      # Audio processing pipeline
│   ├── audio_capture.py       # Mic/file input (sounddevice)
│   ├── audio_player.py        # Audio playback engine (sounddevice)
│   ├── signal_processor.py    # DSP: FFT, STFT, filtering (numpy/scipy)
│   ├── voice_separator.py     # HPSS + VAD + speaker clustering
│   ├── voice_enhancer.py      # Noise reduction, EQ, compression
│   ├── transcriber.py         # Whisper + Ollama speech-to-text
│   ├── voice_matcher.py       # MFCC speaker embeddings + cosine similarity
│   ├── speaker_diarizer.py    # Auto speaker diarization (VAD → MFCC → clustering)
│   └── project_manager.py     # Project lifecycle management
├── db/                        # Persistence layer
│   ├── database.py            # SQLite manager with full CRUD
│   ├── models.py              # Dataclass entities
│   └── schema.py              # DDL + migrations
├── ui/                        # PySide6 interface
│   ├── main_window.py         # Central window with flexible splitter layout
│   ├── waveform_widget.py     # Sinusoidal silk-thread waveform with multi-track overlay
│   ├── comparison_widget.py   # Visual voice comparison (superposed silk threads)
│   ├── spectrum_widget.py     # FFT spectrum analyzer
│   ├── transport_bar.py       # Playback/recording controls
│   ├── project_panel.py       # Project management sidebar
│   ├── voice_panel.py         # Voice profiles & matching
│   ├── transcript_panel.py    # Transcription viewer
│   └── theme.py               # NSA signal-intelligence theme (IBM Plex Sans)
└── utils/                     # Shared utilities
    ├── audio_utils.py         # Format conversion, resampling (25+ formats)
    ├── config.py              # Configuration management
    └── logger.py              # Structured logging

Processing Pipeline

┌──────────────┐    ┌──────────────────┐    ┌─────────────────┐
│ Audio Capture │───▸│ Signal Processor │───▸│ Voice Separator │
│  (mic/file)  │    │  (FFT, filter)   │    │  (HPSS + VAD)   │
└──────────────┘    └──────────────────┘    └────────┬────────┘
                                                      │
                    ┌──────────────────┐              │
                    │   Transcriber    │◂─────────────┤
                    │ (whisper/ollama) │              │
                    └──────────────────┘    ┌────────▾────────┐
                                           │ Voice Enhancer  │
                    ┌──────────────────┐   │ (denoise + EQ)  │
                    │  Voice Matcher   │◂──└─────────────────┘
                    │ (MFCC + cosine)  │
                    └────────┬─────────┘
                             │
                    ┌────────▾─────────┐
                    │Speaker Diarizer  │
                    │(VAD→MFCC→cluster)│
                    └──────────────────┘

Database Schema

Table	Description
`projects`	Analysis sessions with metadata
`audio_segments`	Individual audio clips within a project
`voice_profiles`	Named speaker embeddings (MFCC vectors)
`transcriptions`	Speech-to-text results per segment
`match_results`	Voice concordance scores between profiles

Technology Stack

Component	Technology	Purpose
UI Framework	PySide6 (Qt 6)	Desktop application with custom widgets
DSP Engine	NumPy + SciPy	FFT, filtering, spectral analysis, clustering
Audio I/O	sounddevice + soundfile	Mic capture, file loading/saving (25+ formats)
Audio Convert	FFmpeg (subprocess)	Transcode OPUS, AAC, M4A, etc. to WAV
Transcription	faster-whisper / Ollama	Local speech-to-text
Speaker Diarization	scipy.cluster.hierarchy	Automatic multi-speaker identification
Database	SQLite3 (stdlib)	Project data persistence
Testing	pytest + pytest-cov	96%+ coverage
Linting	ruff	Fast Python linter
Type Checking	mypy	Static type analysis
CI/CD	GitHub Actions	Automated lint, test, build

Development

Setup

git clone https://github.com/killmonger/spectra.git
cd spectra
pip install -e ".[dev]"

Commands

# Run the app
make run

# Run tests
make test

# Run tests with coverage report
make test-cov

# Run fast tests only (skip slow/integration)
make test-fast

# Lint code
make lint

# Auto-format code
make format

# Type check
make typecheck

# Clean build artifacts
make clean

Testing

The test suite targets 95% code coverage using pytest:

# Full test suite with coverage
pytest tests/ -v --cov=spectra --cov-report=term-missing

# Run specific test modules
pytest tests/core/test_signal_processor.py -v
pytest tests/db/ -v
pytest tests/ui/ -v

# Skip tests requiring hardware
pytest tests/ -m "not integration"

CI/CD

GitHub Actions pipelines run automatically on push/PR:

Lint — ruff format check + lint rules
Type Check — mypy strict mode
Test — pytest across Python 3.11/3.12/3.13 on Ubuntu + macOS
Build — Package verification (sdist + wheel)

Project Structure Conventions

All modules use from __future__ import annotations for forward references
Type hints on all public functions
Logging via logging.getLogger(__name__)
Database operations use parameterized queries (no SQL injection)
Audio processing uses pure NumPy/SciPy (no heavy ML frameworks required)

License

MIT License — see LICENSE for details.

SPECTRA — Local-first audio intelligence.
No cloud. No telemetry. Your audio stays on your machine.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
.idea		.idea
assets		assets
src/spectra		src/spectra
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Features

Core Audio Processing

User Interface

Data Management

Installation

Prerequisites

macOS

Ubuntu/Debian

Install SPECTRA

Ollama Integration (Optional)

Usage

Launch the Application

Workflow

Configuration

Architecture

Processing Pipeline

Database Schema

Technology Stack

Development

Setup

Commands

Testing

CI/CD

Project Structure Conventions

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Features

Core Audio Processing

User Interface

Data Management

Installation

Prerequisites

macOS

Ubuntu/Debian

Install SPECTRA

Ollama Integration (Optional)

Usage

Launch the Application

Workflow

Configuration

Architecture

Processing Pipeline

Database Schema

Technology Stack

Development

Setup

Commands

Testing

CI/CD

Project Structure Conventions

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages