Skip to content

MiltonJ23/Spectra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SPECTRA Banner

FeaturesInstallationUsageArchitectureDevelopmentLicense


Overview

SPECTRA is a local-first desktop application for audio signal intelligence. It captures audio from microphones or files, isolates human voices, enhances voice quality, transcribes speech, and performs speaker verification — all running on your machine with zero cloud dependencies.

Built with a sleek, dark interface inspired by signal-intelligence platforms, SPECTRA provides sinusoidal silk-thread waveform visualization, real-time spectrum analysis, automatic speaker diarization, and project-based workflow management.

Features

Core Audio Processing

  • Audio Capture — Record from any system microphone or import 25+ audio formats (WAV, FLAC, OGG, MP3, OPUS, AAC, M4A, etc.)
  • Signal Processing — FFT, STFT, spectral analysis, configurable Butterworth filters (bandpass, highpass, lowpass)
  • Voice Separation — Harmonic/percussive source separation (HPSS) to isolate human voices from background audio
  • Voice Enhancement — Spectral gating noise reduction, parametric EQ with voice-optimized presets, dynamic range compression
  • Transcription — Local speech-to-text via faster-whisper (optional) or Ollama-powered analysis
  • Voice Matching — MFCC-based speaker embeddings with cosine similarity scoring for voice concordance analysis
  • Automatic Speaker Diarization — VAD → MFCC embeddings → agglomerative clustering to identify and label distinct speakers

User Interface

  • NSA Signal Intelligence Aesthetic — Dark theme with IBM Plex Sans typography, muted cool tones, rounded corners, generous negative space
  • Sinusoidal Silk-Thread Waveform — Smooth Bézier-curve waveform with translucent glow, multi-track overlay per speaker (each with distinct colour)
  • Visual Voice Comparison — Superposed silk threads with normalised cross-correlation concordance scoring, error margins, and colour-coded overlays
  • Spectrum Analyzer — Logarithmic-frequency FFT bar display with peak hold decay and colour gradient
  • Project Management — Each analysis session is a self-contained project with labelled audio segments and voice profiles
  • Transport Controls — Record, play, pause, stop with position scrubbing and volume control

Data Management

  • SQLite Database — All project data stored locally with WAL mode for performance
  • Voice Profiles — Create named speaker profiles with computed embeddings, compare any two for concordance
  • Auto-Detected Speakers — Diarization automatically creates voice profiles per detected speaker
  • Labelled Segments — Tag and organize audio extracts with custom names and labels
  • Export — Export processed audio segments and transcription text

Installation

Prerequisites

  • Python 3.11+ (tested on 3.11, 3.12, 3.13)
  • macOS or Linux (Windows support untested)
  • PortAudio (for microphone access)

macOS

brew install portaudio libsndfile

Ubuntu/Debian

sudo apt-get install libportaudio2 libsndfile1

Install SPECTRA

# Clone the repository
git clone https://github.com/killmonger/spectra.git
cd spectra

# Install in development mode
pip install -e ".[dev]"

# Optional: Install whisper transcription backend
pip install -e ".[transcription]"

Ollama Integration (Optional)

SPECTRA can use Ollama for transcript analysis and summarization:

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama3.2

Usage

Launch the Application

# Via command line
spectra

# Or directly
python -m spectra

Workflow

  1. Create a Project — Click + NEW in the project panel, name your analysis session
  2. Import or Record Audio — Use FILE > Import Audio… or click ⏺ REC to record from microphone
  3. Visualize — Waveform and spectrum displays update in real-time
  4. Process:
    • PROCESS > Separate Voice — Isolate human voice from background
    • PROCESS > Enhance Voice — Apply noise reduction and EQ optimization
    • PROCESS > Transcribe — Convert speech to text
  5. Voice Profiles — In the VOICES tab, create speaker profiles from audio selections
  6. Compare Voices — Select two profiles and click ⇌ COMPARE for concordance scoring
  7. Label & Organize — Name segments, add labels, build your analysis

Configuration

SPECTRA reads configuration from spectra.json in the working directory:

{
  "audio": {
    "sample_rate": 44100,
    "channels": 1,
    "block_size": 1024,
    "dtype": "float32"
  },
  "ollama": {
    "url": "http://localhost:11434",
    "model": "llama3.2",
    "timeout": 30
  },
  "projects_dir": "projects",
  "log_level": "INFO",
  "db_path": "spectra.db"
}

Architecture

src/spectra/
├── app.py                     # Application entry point
├── core/                      # Audio processing pipeline
│   ├── audio_capture.py       # Mic/file input (sounddevice)
│   ├── audio_player.py        # Audio playback engine (sounddevice)
│   ├── signal_processor.py    # DSP: FFT, STFT, filtering (numpy/scipy)
│   ├── voice_separator.py     # HPSS + VAD + speaker clustering
│   ├── voice_enhancer.py      # Noise reduction, EQ, compression
│   ├── transcriber.py         # Whisper + Ollama speech-to-text
│   ├── voice_matcher.py       # MFCC speaker embeddings + cosine similarity
│   ├── speaker_diarizer.py    # Auto speaker diarization (VAD → MFCC → clustering)
│   └── project_manager.py     # Project lifecycle management
├── db/                        # Persistence layer
│   ├── database.py            # SQLite manager with full CRUD
│   ├── models.py              # Dataclass entities
│   └── schema.py              # DDL + migrations
├── ui/                        # PySide6 interface
│   ├── main_window.py         # Central window with flexible splitter layout
│   ├── waveform_widget.py     # Sinusoidal silk-thread waveform with multi-track overlay
│   ├── comparison_widget.py   # Visual voice comparison (superposed silk threads)
│   ├── spectrum_widget.py     # FFT spectrum analyzer
│   ├── transport_bar.py       # Playback/recording controls
│   ├── project_panel.py       # Project management sidebar
│   ├── voice_panel.py         # Voice profiles & matching
│   ├── transcript_panel.py    # Transcription viewer
│   └── theme.py               # NSA signal-intelligence theme (IBM Plex Sans)
└── utils/                     # Shared utilities
    ├── audio_utils.py         # Format conversion, resampling (25+ formats)
    ├── config.py              # Configuration management
    └── logger.py              # Structured logging

Processing Pipeline

┌──────────────┐    ┌──────────────────┐    ┌─────────────────┐
│ Audio Capture │───▸│ Signal Processor │───▸│ Voice Separator │
│  (mic/file)  │    │  (FFT, filter)   │    │  (HPSS + VAD)   │
└──────────────┘    └──────────────────┘    └────────┬────────┘
                                                      │
                    ┌──────────────────┐              │
                    │   Transcriber    │◂─────────────┤
                    │ (whisper/ollama) │              │
                    └──────────────────┘    ┌────────▾────────┐
                                           │ Voice Enhancer  │
                    ┌──────────────────┐   │ (denoise + EQ)  │
                    │  Voice Matcher   │◂──└─────────────────┘
                    │ (MFCC + cosine)  │
                    └────────┬─────────┘
                             │
                    ┌────────▾─────────┐
                    │Speaker Diarizer  │
                    │(VAD→MFCC→cluster)│
                    └──────────────────┘

Database Schema

Table Description
projects Analysis sessions with metadata
audio_segments Individual audio clips within a project
voice_profiles Named speaker embeddings (MFCC vectors)
transcriptions Speech-to-text results per segment
match_results Voice concordance scores between profiles

Technology Stack

Component Technology Purpose
UI Framework PySide6 (Qt 6) Desktop application with custom widgets
DSP Engine NumPy + SciPy FFT, filtering, spectral analysis, clustering
Audio I/O sounddevice + soundfile Mic capture, file loading/saving (25+ formats)
Audio Convert FFmpeg (subprocess) Transcode OPUS, AAC, M4A, etc. to WAV
Transcription faster-whisper / Ollama Local speech-to-text
Speaker Diarization scipy.cluster.hierarchy Automatic multi-speaker identification
Database SQLite3 (stdlib) Project data persistence
Testing pytest + pytest-cov 96%+ coverage
Linting ruff Fast Python linter
Type Checking mypy Static type analysis
CI/CD GitHub Actions Automated lint, test, build

Development

Setup

git clone https://github.com/killmonger/spectra.git
cd spectra
pip install -e ".[dev]"

Commands

# Run the app
make run

# Run tests
make test

# Run tests with coverage report
make test-cov

# Run fast tests only (skip slow/integration)
make test-fast

# Lint code
make lint

# Auto-format code
make format

# Type check
make typecheck

# Clean build artifacts
make clean

Testing

The test suite targets 95% code coverage using pytest:

# Full test suite with coverage
pytest tests/ -v --cov=spectra --cov-report=term-missing

# Run specific test modules
pytest tests/core/test_signal_processor.py -v
pytest tests/db/ -v
pytest tests/ui/ -v

# Skip tests requiring hardware
pytest tests/ -m "not integration"

CI/CD

GitHub Actions pipelines run automatically on push/PR:

  1. Lint — ruff format check + lint rules
  2. Type Check — mypy strict mode
  3. Test — pytest across Python 3.11/3.12/3.13 on Ubuntu + macOS
  4. Build — Package verification (sdist + wheel)

Project Structure Conventions

  • All modules use from __future__ import annotations for forward references
  • Type hints on all public functions
  • Logging via logging.getLogger(__name__)
  • Database operations use parameterized queries (no SQL injection)
  • Audio processing uses pure NumPy/SciPy (no heavy ML frameworks required)

License

MIT License — see LICENSE for details.


SPECTRA — Local-first audio intelligence.
No cloud. No telemetry. Your audio stays on your machine.

About

A local first application for audio signal intelligence

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors