Super Over Alchemy

AI-powered video analysis platform using Google Gemini, featuring separated media processing and scene analysis workflows with custom prompt management and context file support.

Features

Media Processing Workflow

Video Compression: Multi-resolution transcoding (360p-2160p) with configurable CRF and presets
Audio Extraction: Multiple formats (MP3, AAC, WAV) with configurable bitrates
Metadata Analysis: Automatic extraction of video/audio properties
Job Management: Track processing status, progress, and results

Scene Analysis Workflow

Custom Prompts: Create and manage reusable analysis prompts with different types
Context Support: Upload additional context files (text, markdown, JSON) to enhance analysis accuracy
Flexible Chunking: Optional video/audio chunking for long files (or analyze entire files)
Cost & Token Tracking: detailed tracking of token usage and estimated costs per job and per chunk
AI-Powered Analysis: Leverage Google Gemini 2.5 Pro for intelligent scene understanding
Multiple Analysis Types: Scene detection, object identification, transcription, character identification, key moments, sentiment analysis, and more

Technical Features

Dual Worker Architecture: Separate workers for media processing and scene analysis
Sequential/Parallel Processing: Configurable processing modes for scene analysis
Cloud-Native: Designed for local development with easy Cloud Run deployment
Modern Frontend: Next.js 15 with TypeScript, TailwindCSS, and shadcn/ui

Architecture

System Components

Frontend Layer

Next.js Frontend - React-based UI for media upload, job management, and results visualization
Agentspace - User interaction and workflow orchestration

API & Messaging

FastAPI - RESTful API handling requests, job creation, and status queries
Pub/Sub - Event-driven messaging for asynchronous job processing

Data Layer

Firestore - NoSQL database storing videos, jobs, prompts, and analysis results
Cloud Storage - Object storage for uploaded media, processed outputs, and context files

Processing Layer

Media Processor - FFmpeg-based worker for video compression and audio extraction
Analysis Engine - AI-powered scene analysis using Google Gemini (Vertex AI)

Analytics

BigQuery - Data warehouse for analytics and reporting on job metrics and results

Data Flow

Upload: Frontend → API → Cloud Storage
Job Creation: API → Firestore → Pub/Sub
Media Processing: Media Processor reads from Pub/Sub, processes files, writes to Cloud Storage
Scene Analysis: Analysis Engine reads from Pub/Sub, calls Vertex AI, stores results in Firestore
Analytics: Results exported to BigQuery for analysis

Sequence Diagrams

Detailed sequence diagrams are available for both workflows:

Media Worker Sequence Diagram - Shows the complete flow from video upload through processing to result delivery
Scene Worker Sequence Diagram - Shows the scene analysis workflow including context file support, chunking strategies, and Gemini integration

Workflows

Workflow 1: Media Processing

Upload Video/Audio → Create Media Job → Worker Processes → Results Available
                            ↓
                    Configure: Resolution, Audio Format, Bitrate, CRF, Preset

Workflow 2: Scene Analysis

Select Processed Media → Choose Prompt → Upload Context (Optional) → Configure Chunking → Start Analysis
                                                                              ↓
                                                                      Worker Analyzes with Gemini
                                                                              ↓
                                                                      View Results & Cost in Frontend

Local Development Setup

Prerequisites

Python 3.9+
```
python --version
```

ffmpeg (for media processing)

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Verify
ffmpeg -version

Google Cloud SDK

# macOS
brew install --cask google-cloud-sdk

# Login and set project
gcloud auth application-default login

gcloud auth application-default login \
--impersonate-service-account=secshare-service-account@search-and-reco.iam.gserviceaccount.com

gcloud config set project YOUR_PROJECT_ID

Node.js 18+ and npm (for frontend)
```
node --version
npm --version
```

GCP Setup

Create GCP Project

export PROJECT_ID="your-project-id"
gcloud config set project $PROJECT_ID

Enable APIs

gcloud services enable \
  storage.googleapis.com \
  firestore.googleapis.com

Create GCS Buckets

gsutil mb -l asia-south1 gs://${PROJECT_ID}-uploads
gsutil mb -l asia-south1 gs://${PROJECT_ID}-processed

Create Firestore Database

gcloud firestore databases create --location=asia-south1

Get Gemini API Key
- Visit Google AI Studio
- Create an API key

Backend Setup

Install Python dependencies

cd super-over-alchemy
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Configure environment

cp .env.example .env
# Edit .env with your values

Example .env:

# GCP Configuration
GCP_PROJECT_ID=your-project-id
GCP_REGION=asia-south1

# GCS Buckets
UPLOADS_BUCKET=your-project-id-uploads
PROCESSED_BUCKET=your-project-id-processed

# Gemini API
GEMINI_API_KEY=your-gemini-api-key
GEMINI_MODEL=models/gemini-2.5-pro
GEMINI_MAX_OUTPUT_TOKENS=65536

# Worker Settings
WORKER_POLL_INTERVAL_SECONDS=5
MAX_CONCURRENT_TASKS=3

# Scene Processing Mode
SCENE_PROCESSING_MODE=sequential  # or "parallel"
MAX_GEMINI_WORKERS=10

# Environment
ENVIRONMENT=local
API_URL=http://localhost:8000
FRONTEND_URL=http://localhost:3000

Run the API
```
python api/main.py
```
API available at http://localhost:8000
- Swagger docs: http://localhost:8000/docs
- Health check: http://localhost:8000/health

Run the Workers (in separate terminals)

# Terminal 1: Media Worker
source venv/bin/activate
python workers/media_worker.py

# Terminal 2: Scene Worker
source venv/bin/activate
python workers/ai_worker.py

Frontend Setup

Install dependencies
```
cd frontend
npm install
```

Configure environment

cp .env.local.example .env.local
# Edit .env.local with your API URL

Example .env.local:

NEXT_PUBLIC_API_URL=http://localhost:8000

Run development server
```
npm run dev
```
Frontend available at http://localhost:3000
Build for production
```
npm run build
npm start
```

Usage Guide

1. Upload and Process Media

Navigate to http://localhost:3000/media
Click "Upload Video" and select a file
Configure processing options:
- Compression resolution (360p-2160p)
- Audio format (MP3, AAC, WAV)
- Audio bitrate
- CRF (quality: 0-51, lower = better)
- Preset (speed vs efficiency)
Click "Start Processing"
Monitor job status in the dashboard

2. Create Analysis Prompts

Navigate to http://localhost:3000/prompts
Click "Create Prompt"
Fill in:
- Name (e.g., "Sports Commentary Analysis")
- Type (Scene Analysis, Subtitling, Custom, etc.)
- Prompt text (instructions for Gemini)
- Optional: Check "Supports additional context files"
- Optional: Add context description
Save the prompt

3. Analyze Scenes

Navigate to http://localhost:3000/scene-analysis
Click "Start New Analysis"
Select processed media (compressed video or extracted audio)
Choose a prompt from the dropdown
If prompt supports context: Upload additional context files (text, markdown, JSON)
Configure chunking:
- No chunking (recommended for audio < 5 hours, video < 1 hour)
- Or set chunk duration (60s, 120s, 5min, 10min)
Click "Start Scene Analysis"
View results when processing completes

4. API Usage (Programmatic)

Create Media Job

curl -X POST http://localhost:8000/api/media/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "video_id": "video-uuid",
    "config": {
      "compress": true,
      "compress_resolution": "720p",
      "extract_audio": true,
      "audio_format": "mp3",
      "audio_bitrate": "192k",
      "crf": 23,
      "preset": "medium"
    }
  }'

Create Scene Job

curl -X POST http://localhost:8000/api/scenes/{video_id}/process \
  -H "Content-Type: application/json" \
  -d '{
    "prompt_id": "prompt-uuid",
    "compressed_video_path": "gs://bucket/path/to/video.mp4",
    "chunk_duration": 0,
    "chunk": false,
    "context_items": [
      {
        "context_id": "ctx-uuid",
        "type": "text",
        "gcs_path": "gs://bucket/context/file.txt",
        "filename": "reference.txt",
        "description": "Player statistics",
        "size_bytes": 1024
      }
    ]
  }'

Project Structure

super-over-alchemy/
├── api/                          # FastAPI REST API
│   ├── main.py                  # Application entry point
│   ├── routes/
│   │   ├── scenes.py            # Scene analysis endpoints
│   │   ├── media.py             # Media processing endpoints
│   │   ├── prompts.py           # Prompt management endpoints
│   │   └── tasks.py             # Task status endpoints
│   └── models/
│       └── schemas.py           # Pydantic request/response models
├── libs/                         # Shared libraries
│   ├── storage.py               # Google Cloud Storage client
│   ├── database.py              # Firestore client with collections
│   ├── video_processing/        # FFmpeg-based processing
│   │   ├── metadata.py          # Probe video/audio metadata
│   │   ├── compressor.py        # Video transcoding
│   │   ├── chunker.py           # Video chunking
│   │   ├── audio.py             # Audio extraction
│   │   └── manifest.py          # Processing manifests
│   ├── gemini/                  # Gemini AI integration
│   │   └── scene_analyzer.py   # Scene analysis with context support
│   └── scene_processing/        # Scene processing strategies
│       ├── base.py              # Abstract processor
│       ├── sequential.py        # Sequential chunk processing
│       ├── parallel.py          # Parallel chunk processing
│       └── factory.py           # Processor factory
├── workers/                      # Background job processors
│   ├── media_worker.py          # Media processing worker
│   └── ai_worker.py             # AI analysis worker (scene & images)
├── frontend/                     # Next.js 15 frontend
│   ├── app/                     # App router pages
│   │   ├── media/               # Media workflow
│   │   ├── prompts/             # Prompt management
│   │   ├── scene-analysis/      # Scene analysis workflow
│   │   └── scene/               # Scene results
│   ├── components/              # React components
│   │   ├── ui/                  # shadcn/ui components
│   │   ├── media/               # Media-specific components
│   │   ├── scene/               # Scene-specific components
│   │   └── prompts/             # Prompt components
│   └── lib/                     # Frontend utilities
│       ├── api-client.ts        # API client with typed methods
│       ├── types.ts             # TypeScript types
│       └── utils.ts             # Helper functions
├── tests/                        # Test suites
├── storage/temp/                 # Local temporary files
├── config.py                     # Centralized configuration
├── requirements.txt              # Python dependencies
├── .env.example                 # Environment template
└── README.md                    # This file

Configuration

Key Environment Variables

Variable	Description	Default	Example
`GEMINI_MODEL`	Gemini model to use	`models/gemini-2.0-flash-exp`	`models/gemini-2.5-pro`
`GEMINI_MAX_OUTPUT_TOKENS`	Max tokens in response	`8192`	`65536` (for 2.5 Pro)
`SCENE_PROCESSING_MODE`	Sequential or parallel	`sequential`	`parallel`
`MAX_GEMINI_WORKERS`	Parallel worker count	`10`	`4`
`CHUNK_DURATION_SECONDS`	Default chunk size	`30`	`0` (no chunking)
`COMPRESS_RESOLUTION`	Default resolution	`480p`	`720p`

Processing Modes

Sequential Mode (Default)

Processes one chunk at a time
Lower memory usage
Predictable performance
Recommended for most use cases

Parallel Mode

Processes multiple chunks simultaneously
Higher memory usage
Faster for long videos with many chunks
Uses process-based parallelism (isolated SSL contexts)

Advanced Features

Context File Support

Upload additional context files to enhance analysis accuracy:

Create a prompt with "Supports additional context files" enabled
During scene analysis, upload text files (.txt, .md, .json) up to 10MB
Context is loaded once and included with all chunks
Gemini receives: prompt + context + video/audio

Use Cases:

Sports analysis: Upload team rosters, player stats
Educational content: Upload lecture notes, reference material
Product reviews: Upload specification sheets
Interview analysis: Upload company background, job description

Custom Prompts

Create specialized prompts for different analysis types:

Scene Analysis: General scene understanding and visual analysis
Object Identification: Detect and track objects, brands, or specific items
Subtitling: Generate SRT-formatted subtitles and transcriptions
Key Moments: Identify highlights, important events, and memorable scenes
Cliffhanger Analysis: Detect suspenseful moments and dramatic endings
Custom: Any specialized analysis task tailored to your needs

Chunking Strategies

No Chunking (Recommended for most cases)

Analyzes entire file as one piece
Better API quota usage (4× more efficient)
No timestamp ordering issues
Best for: Audio < 5 hours, Video < 1 hour

Fixed Duration Chunking

Split into equal segments (60s, 120s, 5min, 10min)
Required for very long files
Each chunk analyzed separately
Results combined in order

Deployment

Cloud Run Deployment

Coming soon... The application is designed for easy Cloud Run deployment with:

Containerized API and workers
Cloud Storage for file handling
Firestore for state management
Automatic scaling

Monitoring and Debugging

Logs

API: Console logs with FastAPI
Workers: Detailed logging with progress tracking
Frontend: Browser console and Network tab

Common Issues

Worker not picking up jobs: Check worker is running and polling interval
Gemini API errors: Verify API key and model availability
Video processing fails: Ensure ffmpeg is installed and accessible
Context upload fails: Check file size < 10MB and format (.txt, .md, .json)

Contributing

This is a personal project. Feel free to fork and modify for your own use.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
api		api
docs		docs
frontend		frontend
libs		libs
scripts		scripts
storage/temp		storage/temp
tests		tests
workers		workers
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile.api		Dockerfile.api
Dockerfile.worker		Dockerfile.worker
GEMINI.md		GEMINI.md
config.py		config.py
deploy.sh		deploy.sh
pre-deploy.sh		pre-deploy.sh
pytest.ini		pytest.ini
readme.md		readme.md
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
ruff.toml		ruff.toml
run_tests.sh		run_tests.sh
scene_analysis_schema.json		scene_analysis_schema.json

Folders and files

Latest commit

History

Repository files navigation

Super Over Alchemy

Features

Media Processing Workflow

Scene Analysis Workflow

Technical Features

Architecture

System Components

Data Flow

Sequence Diagrams

Workflows

Workflow 1: Media Processing

Workflow 2: Scene Analysis

Local Development Setup

Prerequisites

GCP Setup

Backend Setup

Frontend Setup

Usage Guide

1. Upload and Process Media

2. Create Analysis Prompts

3. Analyze Scenes

4. API Usage (Programmatic)

Create Media Job

Create Scene Job

Project Structure

Configuration

Key Environment Variables

Processing Modes

Advanced Features

Context File Support

Custom Prompts

Chunking Strategies

Deployment

Cloud Run Deployment

Monitoring and Debugging

Logs

Common Issues

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages