AI-powered video analysis platform using Google Gemini, featuring separated media processing and scene analysis workflows with custom prompt management and context file support.
- Video Compression: Multi-resolution transcoding (360p-2160p) with configurable CRF and presets
- Audio Extraction: Multiple formats (MP3, AAC, WAV) with configurable bitrates
- Metadata Analysis: Automatic extraction of video/audio properties
- Job Management: Track processing status, progress, and results
- Custom Prompts: Create and manage reusable analysis prompts with different types
- Context Support: Upload additional context files (text, markdown, JSON) to enhance analysis accuracy
- Flexible Chunking: Optional video/audio chunking for long files (or analyze entire files)
- Cost & Token Tracking: detailed tracking of token usage and estimated costs per job and per chunk
- AI-Powered Analysis: Leverage Google Gemini 2.5 Pro for intelligent scene understanding
- Multiple Analysis Types: Scene detection, object identification, transcription, character identification, key moments, sentiment analysis, and more
- Dual Worker Architecture: Separate workers for media processing and scene analysis
- Sequential/Parallel Processing: Configurable processing modes for scene analysis
- Cloud-Native: Designed for local development with easy Cloud Run deployment
- Modern Frontend: Next.js 15 with TypeScript, TailwindCSS, and shadcn/ui
Frontend Layer
- Next.js Frontend - React-based UI for media upload, job management, and results visualization
- Agentspace - User interaction and workflow orchestration
API & Messaging
- FastAPI - RESTful API handling requests, job creation, and status queries
- Pub/Sub - Event-driven messaging for asynchronous job processing
Data Layer
- Firestore - NoSQL database storing videos, jobs, prompts, and analysis results
- Cloud Storage - Object storage for uploaded media, processed outputs, and context files
Processing Layer
- Media Processor - FFmpeg-based worker for video compression and audio extraction
- Analysis Engine - AI-powered scene analysis using Google Gemini (Vertex AI)
Analytics
- BigQuery - Data warehouse for analytics and reporting on job metrics and results
- Upload: Frontend → API → Cloud Storage
- Job Creation: API → Firestore → Pub/Sub
- Media Processing: Media Processor reads from Pub/Sub, processes files, writes to Cloud Storage
- Scene Analysis: Analysis Engine reads from Pub/Sub, calls Vertex AI, stores results in Firestore
- Analytics: Results exported to BigQuery for analysis
Detailed sequence diagrams are available for both workflows:
- Media Worker Sequence Diagram - Shows the complete flow from video upload through processing to result delivery
- Scene Worker Sequence Diagram - Shows the scene analysis workflow including context file support, chunking strategies, and Gemini integration
Upload Video/Audio → Create Media Job → Worker Processes → Results Available
↓
Configure: Resolution, Audio Format, Bitrate, CRF, Preset
Select Processed Media → Choose Prompt → Upload Context (Optional) → Configure Chunking → Start Analysis
↓
Worker Analyzes with Gemini
↓
View Results & Cost in Frontend
-
Python 3.9+
python --version
-
ffmpeg (for media processing)
# macOS brew install ffmpeg # Ubuntu/Debian sudo apt-get install ffmpeg # Verify ffmpeg -version
-
Google Cloud SDK
# macOS brew install --cask google-cloud-sdk # Login and set project gcloud auth application-default login gcloud auth application-default login \ --impersonate-service-account=secshare-service-account@search-and-reco.iam.gserviceaccount.com gcloud config set project YOUR_PROJECT_ID
-
Node.js 18+ and npm (for frontend)
node --version npm --version
-
Create GCP Project
export PROJECT_ID="your-project-id" gcloud config set project $PROJECT_ID
-
Enable APIs
gcloud services enable \ storage.googleapis.com \ firestore.googleapis.com -
Create GCS Buckets
gsutil mb -l asia-south1 gs://${PROJECT_ID}-uploads gsutil mb -l asia-south1 gs://${PROJECT_ID}-processed
-
Create Firestore Database
gcloud firestore databases create --location=asia-south1
-
Get Gemini API Key
- Visit Google AI Studio
- Create an API key
-
Install Python dependencies
cd super-over-alchemy python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Configure environment
cp .env.example .env # Edit .env with your valuesExample
.env:# GCP Configuration GCP_PROJECT_ID=your-project-id GCP_REGION=asia-south1 # GCS Buckets UPLOADS_BUCKET=your-project-id-uploads PROCESSED_BUCKET=your-project-id-processed # Gemini API GEMINI_API_KEY=your-gemini-api-key GEMINI_MODEL=models/gemini-2.5-pro GEMINI_MAX_OUTPUT_TOKENS=65536 # Worker Settings WORKER_POLL_INTERVAL_SECONDS=5 MAX_CONCURRENT_TASKS=3 # Scene Processing Mode SCENE_PROCESSING_MODE=sequential # or "parallel" MAX_GEMINI_WORKERS=10 # Environment ENVIRONMENT=local API_URL=http://localhost:8000 FRONTEND_URL=http://localhost:3000
-
Run the API
python api/main.py
API available at
http://localhost:8000- Swagger docs:
http://localhost:8000/docs - Health check:
http://localhost:8000/health
- Swagger docs:
-
Run the Workers (in separate terminals)
# Terminal 1: Media Worker source venv/bin/activate python workers/media_worker.py # Terminal 2: Scene Worker source venv/bin/activate python workers/ai_worker.py
-
Install dependencies
cd frontend npm install -
Configure environment
cp .env.local.example .env.local # Edit .env.local with your API URLExample
.env.local:NEXT_PUBLIC_API_URL=http://localhost:8000
-
Run development server
npm run dev
Frontend available at
http://localhost:3000 -
Build for production
npm run build npm start
- Navigate to
http://localhost:3000/media - Click "Upload Video" and select a file
- Configure processing options:
- Compression resolution (360p-2160p)
- Audio format (MP3, AAC, WAV)
- Audio bitrate
- CRF (quality: 0-51, lower = better)
- Preset (speed vs efficiency)
- Click "Start Processing"
- Monitor job status in the dashboard
- Navigate to
http://localhost:3000/prompts - Click "Create Prompt"
- Fill in:
- Name (e.g., "Sports Commentary Analysis")
- Type (Scene Analysis, Subtitling, Custom, etc.)
- Prompt text (instructions for Gemini)
- Optional: Check "Supports additional context files"
- Optional: Add context description
- Save the prompt
- Navigate to
http://localhost:3000/scene-analysis - Click "Start New Analysis"
- Select processed media (compressed video or extracted audio)
- Choose a prompt from the dropdown
- If prompt supports context: Upload additional context files (text, markdown, JSON)
- Configure chunking:
- No chunking (recommended for audio < 5 hours, video < 1 hour)
- Or set chunk duration (60s, 120s, 5min, 10min)
- Click "Start Scene Analysis"
- View results when processing completes
curl -X POST http://localhost:8000/api/media/jobs \
-H "Content-Type: application/json" \
-d '{
"video_id": "video-uuid",
"config": {
"compress": true,
"compress_resolution": "720p",
"extract_audio": true,
"audio_format": "mp3",
"audio_bitrate": "192k",
"crf": 23,
"preset": "medium"
}
}'curl -X POST http://localhost:8000/api/scenes/{video_id}/process \
-H "Content-Type: application/json" \
-d '{
"prompt_id": "prompt-uuid",
"compressed_video_path": "gs://bucket/path/to/video.mp4",
"chunk_duration": 0,
"chunk": false,
"context_items": [
{
"context_id": "ctx-uuid",
"type": "text",
"gcs_path": "gs://bucket/context/file.txt",
"filename": "reference.txt",
"description": "Player statistics",
"size_bytes": 1024
}
]
}'super-over-alchemy/
├── api/ # FastAPI REST API
│ ├── main.py # Application entry point
│ ├── routes/
│ │ ├── scenes.py # Scene analysis endpoints
│ │ ├── media.py # Media processing endpoints
│ │ ├── prompts.py # Prompt management endpoints
│ │ └── tasks.py # Task status endpoints
│ └── models/
│ └── schemas.py # Pydantic request/response models
├── libs/ # Shared libraries
│ ├── storage.py # Google Cloud Storage client
│ ├── database.py # Firestore client with collections
│ ├── video_processing/ # FFmpeg-based processing
│ │ ├── metadata.py # Probe video/audio metadata
│ │ ├── compressor.py # Video transcoding
│ │ ├── chunker.py # Video chunking
│ │ ├── audio.py # Audio extraction
│ │ └── manifest.py # Processing manifests
│ ├── gemini/ # Gemini AI integration
│ │ └── scene_analyzer.py # Scene analysis with context support
│ └── scene_processing/ # Scene processing strategies
│ ├── base.py # Abstract processor
│ ├── sequential.py # Sequential chunk processing
│ ├── parallel.py # Parallel chunk processing
│ └── factory.py # Processor factory
├── workers/ # Background job processors
│ ├── media_worker.py # Media processing worker
│ └── ai_worker.py # AI analysis worker (scene & images)
├── frontend/ # Next.js 15 frontend
│ ├── app/ # App router pages
│ │ ├── media/ # Media workflow
│ │ ├── prompts/ # Prompt management
│ │ ├── scene-analysis/ # Scene analysis workflow
│ │ └── scene/ # Scene results
│ ├── components/ # React components
│ │ ├── ui/ # shadcn/ui components
│ │ ├── media/ # Media-specific components
│ │ ├── scene/ # Scene-specific components
│ │ └── prompts/ # Prompt components
│ └── lib/ # Frontend utilities
│ ├── api-client.ts # API client with typed methods
│ ├── types.ts # TypeScript types
│ └── utils.ts # Helper functions
├── tests/ # Test suites
├── storage/temp/ # Local temporary files
├── config.py # Centralized configuration
├── requirements.txt # Python dependencies
├── .env.example # Environment template
└── README.md # This file
| Variable | Description | Default | Example |
|---|---|---|---|
GEMINI_MODEL |
Gemini model to use | models/gemini-2.0-flash-exp |
models/gemini-2.5-pro |
GEMINI_MAX_OUTPUT_TOKENS |
Max tokens in response | 8192 |
65536 (for 2.5 Pro) |
SCENE_PROCESSING_MODE |
Sequential or parallel | sequential |
parallel |
MAX_GEMINI_WORKERS |
Parallel worker count | 10 |
4 |
CHUNK_DURATION_SECONDS |
Default chunk size | 30 |
0 (no chunking) |
COMPRESS_RESOLUTION |
Default resolution | 480p |
720p |
Sequential Mode (Default)
- Processes one chunk at a time
- Lower memory usage
- Predictable performance
- Recommended for most use cases
Parallel Mode
- Processes multiple chunks simultaneously
- Higher memory usage
- Faster for long videos with many chunks
- Uses process-based parallelism (isolated SSL contexts)
Upload additional context files to enhance analysis accuracy:
- Create a prompt with "Supports additional context files" enabled
- During scene analysis, upload text files (.txt, .md, .json) up to 10MB
- Context is loaded once and included with all chunks
- Gemini receives: prompt + context + video/audio
Use Cases:
- Sports analysis: Upload team rosters, player stats
- Educational content: Upload lecture notes, reference material
- Product reviews: Upload specification sheets
- Interview analysis: Upload company background, job description
Create specialized prompts for different analysis types:
- Scene Analysis: General scene understanding and visual analysis
- Object Identification: Detect and track objects, brands, or specific items
- Subtitling: Generate SRT-formatted subtitles and transcriptions
- Key Moments: Identify highlights, important events, and memorable scenes
- Cliffhanger Analysis: Detect suspenseful moments and dramatic endings
- Custom: Any specialized analysis task tailored to your needs
No Chunking (Recommended for most cases)
- Analyzes entire file as one piece
- Better API quota usage (4× more efficient)
- No timestamp ordering issues
- Best for: Audio < 5 hours, Video < 1 hour
Fixed Duration Chunking
- Split into equal segments (60s, 120s, 5min, 10min)
- Required for very long files
- Each chunk analyzed separately
- Results combined in order
Coming soon... The application is designed for easy Cloud Run deployment with:
- Containerized API and workers
- Cloud Storage for file handling
- Firestore for state management
- Automatic scaling
- API: Console logs with FastAPI
- Workers: Detailed logging with progress tracking
- Frontend: Browser console and Network tab
- Worker not picking up jobs: Check worker is running and polling interval
- Gemini API errors: Verify API key and model availability
- Video processing fails: Ensure ffmpeg is installed and accessible
- Context upload fails: Check file size < 10MB and format (.txt, .md, .json)
This is a personal project. Feel free to fork and modify for your own use.
MIT
