Turn Canvas, Panopto, and YouTube-style lectures into searchable notes—on your own laptop.
Open source · 100% local · No cloud transcription
| Repository | LocalStream-Transcriber on GitHub |
| Also known as | MP3 Grabber (mp3grabber in package.json) — same app, one repo |
The app uses faster-whisper (OpenAI Whisper, optimized) for speech-to-text, with optional NVIDIA GPU acceleration.
- 🎯 Local & web-based transcription — process files from disk or capture from web pages
- 🌊 HLS/DASH stream support — automatically detects and downloads
.m3u8and.mpdstreams - 🔐 Authenticated content — captures session cookies for Canvas, Kaltura, Panopto
- ⚡ GPU acceleration — 4× faster with NVIDIA CUDA; automatic CPU fallback
- 🌐 Browser extension — Chrome extension (Manifest V3) with passive network interception
- 📊 Real-time progress — live WebSocket updates at
http://localhost:8787 - 🔒 100% local — all processing happens on your machine
- 🎬 Multi-format — audio (MP3, M4A, WAV, FLAC, OGG, WebM) and video (MP4, MKV, AVI)
- 📝 Timestamped output —
[MM:SS.mmm]format with language and confidence metadata
- Start here (students)
- Quick start (Windows)
- Downloads and releases
- How it works
- Configuration
- Supported formats
- Architecture & file structure
- Troubleshooting
- Output format
- Dependencies
- Manual commands
- Security & privacy
- License
LocalStream turns lecture audio and video into searchable text on your computer—so you can skim, search, and study without rewatching whole recordings.
| Platforms | Canvas · Panopto · Kaltura · YouTube-style pages |
| School login | Works with logged-in streams (e.g. Canvas) |
| Privacy | 100% local — no audio sent to external servers for transcription |
| Speed | GPU faster if available; CPU works too (often slower) |
Who it's for: college and grad students who want private, searchable lecture notes without being developers.
Two ways to use it:
- Option 1 – Files you have: Put audio/video in the
media/folder and run the app. - Option 2 – Live browser streams: Use a small Chrome add-on + a local helper to capture Canvas / Panopto / YouTube pages you are already signed into.
Results land in transcriptions/ as plain .txt files with timestamps like [00:01.234].
Install once: Node.js (LTS) and Python 3.10–3.12 — during Python setup on Windows, check "Add python.exe to PATH".
| Step | What to do |
|---|---|
| 1 | Get the app: Latest release ZIP or Code → Download ZIP on GitHub, then unzip (e.g. Documents\LocalStream-Transcriber). |
| 2 | Run it: Double-click START.bat. (Fallback: open Command Prompt in the folder and type npm run setup.) The first run installs speech tools — this can take a few minutes. |
| 3 | Set your model (before transcribing — see Choosing your Whisper model below). |
| 4 | Drop your file: Put your audio/video (.mp3, .mp4, .m4a, etc.) in the media/ folder. |
| 5 | Pick option 1 in the menu, then select your file from the list. |
| 6 | Get your transcript: Find the .txt file in transcriptions/ when it finishes. |
No dedicated GPU? CPU works fine; it will just take longer.
This mode runs a local server that your browser sends streams to. Two parts: one-time extension setup, then per-lecture usage.
| Step | What to do |
|---|---|
| 1 | Get the app and run it: Same as steps 1–2 above. |
| 2 | Pick option 2 in the menu. This starts the local relay server. Leave this window open — it must stay running while you transcribe. |
| 3 | Load the Chrome extension: Open a new tab and go to chrome://extensions/. Turn on Developer mode (top-right toggle). Click Load unpacked and select the extension/ folder inside the project. |
| 4 | Pin the extension: Click the puzzle-piece icon in Chrome's toolbar → click the pin icon next to MP3 Sniper so it shows in your toolbar permanently. |
| Step | What to do |
|---|---|
| 1 | Make sure START.bat is already running with option 2 (the relay server). |
| 2 | Open your lecture page in Chrome (Canvas, Panopto, YouTube, etc.) and start playing the video — even a second or two is enough for the extension to detect the stream. |
| 3 | Click the MP3 Sniper extension icon in your Chrome toolbar (the 🎵 icon). A small popup appears. |
| 4 | Click "Queue All Videos" (green button). The extension sends the stream to the local relay for download and transcription. |
| 5 | Open http://localhost:8787 in a new tab to watch live progress. |
| 6 | When done, your transcript is saved under transcriptions/ as a .txt file with timestamps like [00:01.234]. |
Canvas specifically: If "Queue All Videos" doesn't pick up the lecture, try the blue "Select 1 Video (Canvas)" button instead — it targets the video element directly.
Keyboard shortcut: Press
Ctrl+Shift+Manywhere on the page as an alternative to clicking the popup.Privacy: All speech-to-text runs on your laptop — nothing is uploaded.
The app auto-selects a model based on whether a GPU is detected, but you can change it.
Open transcribe.py and find the section around line 221:
# Use "medium" model for GPU, "base" model for CPU
if gpu_available:
result = transcribe_audio(audio_file, model_size="medium", use_gpu=True)
...
else:
result = transcribe_audio(audio_file, model_size="base", use_gpu=False)| Model | Best for | Notes |
|---|---|---|
tiny |
Very slow CPU, quick tests | Least accurate |
base |
CPU default | Good balance of speed and accuracy on CPU |
small |
Mid-range CPU or light GPU | Better accuracy, slower on CPU |
medium |
GPU default | High accuracy; needs ~3 GB VRAM |
large |
High-end GPU | Highest accuracy, very slow on CPU |
CPU-only laptop? Keep base (the default). If you want higher accuracy and have time to spare, change "base" to "small" in the else branch.
NVIDIA GPU? medium is the default. If you get GPU memory errors, drop to "small" or add compute_type="int8" (edit the compute_type line in transcribe_audio).
There is no Windows installer yet — releases are "download ZIP + first-run setup" using START.bat and npm run setup.
| Include | Why |
|---|---|
| Full project | start.js, relay.js, transcribe.py, package.json, requirements.txt, extension/, START.bat, empty media/ and transcriptions/ |
No node_modules/ |
Paths and binaries differ per machine — npm run setup installs them on the user's PC |
Optional INSTALL.txt |
"Install Node + Python → unzip → double-click START.bat → option 1 or 2" |
- Open Releases → download the latest source ZIP.
- Unzip anywhere (e.g.
Documents\LocalStream-Transcriber). - Install Node.js and Python if needed (see Quick start).
- Double-click
START.bat(or runnpm run setup). - Choose option 1 or option 2 — transcripts appear in
transcriptions/.
GitHub UI (simplest)
- Push changes to
main. - Releases → Create a new release.
- Choose a tag → type e.g.
v1.0.2→ Create new tag on publish, targetmain. - Add a title and release notes, then Publish release. GitHub auto-attaches source ZIPs.
Via git tag
git tag -a v1.0.2 -m "Release v1.0.2: short description"
git push origin v1.0.2Then open Releases → Draft a new release, select the tag, add notes, publish.
Tip: Bump
versioninpackage.jsonto match the tag (e.g.1.0.2with tagv1.0.2).
- Place audio/video files in
media/. - Run
npm run setup→ select option 1. - Choose a file from the list.
- Watch real-time progress (GPU/CPU status shown).
- Transcript saved to
transcriptions/with metadata (device, language, confidence).
- Run
npm run setup→ select option 2 to start the relay server. - Load the extension in Chrome:
chrome://extensions/→ Developer mode → Load unpacked →extension/folder. - Navigate to your lecture page (already signed in). Streams are captured automatically when the page requests them.
- Optional: press
Ctrl+Shift+Mto verify the connection. - Watch progress at
http://localhost:8787; transcripts appear intranscriptions/.
Supported platforms:
- ✅ Canvas LMS
- ✅ Kaltura
- ✅ Panopto
- ✅ AWS CloudFront signed URLs
- ✅ YouTube (HLS streams)
- ✅ Any platform using HLS/DASH protocols
- ✅ Direct audio/video file links
Extension limitations:
- ❌ DRM-protected content — Widevine encryption cannot be bypassed (Netflix, Disney+, etc.)
⚠️ Cookie expiration — very long downloads may expire session cookies⚠️ Network detection — page must actively request.m3u8or.mpdfiles
See Choosing your Whisper model in Quick start for the full table. In short: edit the model_size strings at line ~221 of transcribe.py.
Edit relay.js line 15 and extension/bg.js line 1 to change the port from the default 8787.
In transcribe_audio() in transcribe.py, change compute_type="float16" to compute_type="int8" to reduce VRAM usage.
Audio: .mp3 · .wav · .m4a · .flac · .ogg · .webm
Video (audio track extracted automatically): .mp4 · .mkv · .avi
mp3grabber/
├── media/ # Place your audio/video files here
├── transcriptions/ # All transcription results saved here
├── uploads/ # Temporary files from browser extension
├── downloads/ # Temporary cookie files for yt-dlp
├── extension/
│ ├── manifest.json # Extension configuration (Manifest V3)
│ ├── bg.js # Background service worker
│ └── README.md
├── whisper-bin/ # Whisper model cache (auto-created)
├── config.json # Installation state tracking
├── package.json
├── requirements.txt
├── start.js # Interactive setup and menu
├── relay.js # Express + WebSocket server
├── transcribe.py # Python transcription script (faster-whisper)
├── viewer.html # Real-time web UI at localhost:8787
├── START.bat # Windows launcher
└── README.md
Data flow:
- File mode:
media/→start.js→transcribe.py→transcriptions/ - Extension mode:
webpage→extension→ WebSocket →relay.js→yt-dlp→transcribe.py→transcriptions/
Stack: Node.js (Express + WebSocket), Python (faster-whisper), yt-dlp, Chrome Extension (Manifest V3), NVIDIA CUDA (optional).
If something isn't working and you need to see what the app is doing internally, run:
npm run setup:debugThis shows all hidden log lines prefixed with [DBG] — Python paths, GPU test details, queue internals, raw yt-dlp output, and more. Share this output when reporting issues.
Normal output (no flag) shows only what students need:
✅ Ready · GPU (CUDA) · medium model
📡 Stream detected (1_9i988two)
⬇️ Downloading...
🎙️ Transcribing...
✅ Done → transcriptions/1_9i988two.txt
⏭️ Duplicate ignored means the extension sent the same stream twice — this is normal. The first copy is still downloading/transcribing.
"Python not found"
- Install Python 3.10–3.12 from python.org and check "Add Python to PATH" during install.
"Node.js not found"
- Install Node.js 14+ from nodejs.org. Verify with
node --version.
"yt-dlp not found"
pip install yt-dlp
# or force reinstall everything:
npm run setup:install"No audio files found"
- Place files in the
media/folder. Supported:.mp3 .wav .m4a .flac .ogg .webm .mp4 .mkv .avi
"GPU not working"
- Install NVIDIA drivers from nvidia.com and CUDA Toolkit 12.x from NVIDIA CUDA Downloads.
- The app falls back to CPU automatically — CPU is 2–4× slower but equally accurate.
"Transcription takes too long"
- On CPU: switch to
"small"or keep"base"model intranscribe.py. Close other heavy apps. - On GPU: ensure CUDA is installed. Consider
compute_type="int8"if memory is limited.
"WebSocket connection failed"
- Make sure the relay server is running (
npm run setup→ option 2). - Check that port
8787is not blocked by a firewall or used by another app. - Reload the extension in
chrome://extensions/and try restarting the server.
"No streams detected"
- Open Chrome DevTools → Network tab → filter by
.m3u8or.mpd. If nothing appears, the page is not using HLS/DASH. - Try loading a direct file URL instead.
"Download succeeded but file not found"
- Check
uploads/for UUID-prefixed files; verify yt-dlp completed (check relay server logs).
"Cannot find module './debug'" / "Module not found"
Quick fix:
npm run fixManual fix:
# Windows PowerShell:
Remove-Item -Recurse -Force node_modules, package-lock.json
npm cache clean --force
npm installNever copy
node_modules/between machines — always runnpm installin the new location.
See MIGRATION_GUIDE.md for architecture changes, breaking changes, and rollback instructions.
- GPU: CUDA +
mediummodel = ~4× faster than CPU. - CPU: Use
base(default) orsmallfor a speed/accuracy trade-off. - RAM: 4 GB minimum; 8 GB+ recommended for GPU mode.
- Files: Files under 10 minutes process noticeably faster.
Transcription Results
Generated: 2024-10-26 15:39:48
Source: lecture.mp4
Device: GPU
Compute Type: float16
Model Size: medium
Language: en (99.8% confidence)
--- TRANSCRIPTION ---
[00:00.000] Welcome to the lecture.
[00:05.230] Today we are covering...
[00:10.450] Each segment includes precise timing.
| Package | Version | Purpose |
|---|---|---|
express |
^4.19.2 | HTTP server |
ws |
^8.17.0 | WebSocket |
node-fetch |
^3.3.2 | HTTP client |
uuid |
^9.0.1 | Unique IDs |
faster-whisper— speech-to-text engineyt-dlp— HLS/DASH stream downloadernvidia-cublas-cu12(optional) — CUDA supportnvidia-cudnn-cu12==9.*(optional) — CUDA support
All installed automatically via npm run setup, or manually: pip install -r requirements.txt.
# Install Node.js dependencies
npm install
# Install Python dependencies
pip install -r requirements.txt
# GPU libraries (NVIDIA only)
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.*
# Run setup menu
npm run setup
# Start relay server only (extension mode)
npm start # or: node relay.js
# Force reinstall everything
npm run setup:install
# Transcribe a file directly (outputs JSON)
python transcribe.py path/to/audio.mp3- ✅ 100% local processing — no data sent to external servers for transcription
- ✅ No telemetry — no usage tracking or analytics
- ✅ Temporary file cleanup — extension downloads deleted after processing
- ✅ Open source — inspect all code in this repo
⚠️ Network — the browser extension communicates with a local WebSocket server only (localhost:8787)
Open for contributions. Key areas:
- Additional language support and model tuning
- Better YouTube integration (within legal/technical limits)
- UI/UX improvements for the web viewer
- Performance optimizations
- DRM content — Netflix, Disney+, and similar services use Widevine DRM; this tool cannot bypass it.
- Cookie expiration — very long downloads on authenticated streams may fail if cookies expire.
- HLS/DASH only — the browser extension only detects
.m3u8and.mpdrequests; plain<video>src tags require direct file links. - Large files — files over 2 hours require significant processing time and memory.
- GPU memory — large models on consumer GPUs may need
compute_type="int8".
This project's code is licensed under the MIT License.
When you run transcription, you also use third-party models and libraries (e.g. Whisper / faster-whisper). Please comply with their licenses and with laws in your jurisdiction about recording and transcribing lectures or other audio.
- faster-whisper by SYSTRAN — high-performance Whisper implementation
- OpenAI Whisper — original speech recognition model
- NVIDIA CUDA — GPU acceleration framework