Capture useful context from websites, PDFs, and YouTube transcripts into local JSON, Markdown, and/or SQLite storage you control.
Context Capture Saver is designed for research and AI workflows where you want structured, provenance-aware data instead of ad hoc copy/paste.
- Capture article-focused page content, not raw HTML.
- Save with notes, highlights, and queued highlight workflows.
- YouTube transcript capture with transcript-only, timestamped-segments-only, or both.
- Bubble menu support on YouTube selections, including transcript save actions.
- PDF extraction with metadata and selection-aware capture support.
- Local storage backends:
JSON(default),Markdown,SQLite, andJSON + SQLite(write both). - JSON output location is configurable (
rootorjson/subfolder). - Optional structured JSON chunks (
content.chunks) in addition to fulldocumentText. - SQLite backend uses a graph-ready schema for analytics and AI agent workflows.
- Automatic migration of legacy SQLite databases (
captures-only schema) to the latest schema. - Pending highlights/notes panel now shows queued annotations on-page with delete controls before save.
- Chrome/Chromium (Manifest V3)
- Firefox build output included in the same pipeline
chrome.offscreenis not always available; the extension now detects this and uses module-based fallbacks for PDF extraction.- Clipboard-assisted fallback paths can be more limited on Firefox; on-page selection remains the primary workflow.
- Runtime capability info is captured in diagnostics for PDF captures when diagnostics are enabled.
- Pending highlight rendering uses CSS Highlights when available, and a DOM-wrapper fallback when it is not.
- Manual:
- Verify context menus toggle between regular and YouTube actions based on tab URL.
- Verify PDF capture works with module fallback on Firefox.
- Verify highlight/note queueing works on regular pages and PDF pages.
- Verify pending highlight rendering works on browsers with and without CSS Highlights support.
- Verify JSON, SQLite, JSON+SQLite, and Markdown save modes all write successfully.
- Verify bubble menu actions work and update after settings changes.
- Automated:
- Run
npm run check - Run
npm run typecheck - Run
npm test
- Run
- Install from the Chrome Web Store
- Chrome package:
context-capture-saver-chrome.zip(and compatibility aliascontext-capture-saver.zip) - Firefox package:
context-capture-saver-firefox.zip
npm ci
npm run buildBuild outputs:
- Chrome unpacked extension:
dist/chrome/ - Firefox unpacked add-on:
dist/firefox/ - Chrome zip:
dist/context-capture-saver-chrome.zip - Firefox zip:
dist/context-capture-saver-firefox.zip
Load in browsers:
- Chrome: open
chrome://extensions, enable Developer mode, click Load unpacked, selectdist/chrome/. - Firefox (temporary): open
about:debugging#/runtime/this-firefox, click Load Temporary Add-on, selectdist/firefox/manifest.json.
npm run build # build both Chrome + Firefox
npm run build:chrome # build only Chrome
npm run build:firefox # build only Firefox- Open extension options.
- Choose a local folder with read/write permission.
- Run Test write (optional but recommended).
- Configure:
- storage backend (
JSON,Markdown,SQLite,JSON + SQLite) - JSON folder organization options
- compression and diagnostics options
- logging level (
off,error,info,debug) for runtime observability - optional JSON structured chunk export
- bubble menu layout/style/actions
- storage backend (
Use right-click menus, shortcuts, popup, or bubble actions:
- Save content
- Save content with note
- Save content with highlight
- Highlight
- Highlight with note
On supported YouTube video pages:
- Save transcript
- Save transcript with note
- Selection bubble includes YouTube transcript actions
Transcript storage modes:
- plain transcript text
- timestamped transcript segments
- both
- Extracts text + metadata
- Supports capture and highlight actions
- Browser PDF viewer behavior can still affect some selection UX
Default shortcuts:
- macOS:
Shift + Command + D,Shift + Command + C - Windows/Linux:
Ctrl + Shift + D,Ctrl + Shift + C
In Chrome, edit shortcuts in chrome://extensions/shortcuts.
- Default mode.
- Writes one capture file per save.
- JSON files can be written to the selected folder root or a
json/subfolder. - Optional setting can include
content.chunksfor agent-ready chunked retrieval while keeping full text.
- Writes one
.mdfile per capture with YAML frontmatter metadata. - Body contains the core document text (no structured chunks or timestamped segments).
- Highlights and notes use footnote-style anchors (
[^hl-N]) placed inline where the highlighted text appears, with a "Highlights & Notes" footer section. - Markdown files follow the same folder organization options as JSON (date/type grouping).
- Markdown is a lossy format compared to JSON/SQLite -- optimized for human readability, not round-trip data interchange.
- Writes/updates a single
context-captures.sqlitefile in the selected folder. - Uses SQL.js (no external DB server).
- Automatically upgrades legacy DBs in place.
- Runs guarded maintenance (
ANALYZE) during schema/open checks at a bounded interval, and stores maintenance metadata inmeta.
- Writes both outputs in the same save operation.
- If one backend fails and the other succeeds, successful output is preserved and surfaced.
- Data is written only to your selected local folder.
- No required remote backend.
- No cloud dependency in normal capture flow.
npm ci
npm run lint
npm run typecheck
npm run test:ci
npm run buildnpm run check dynamically syntax-checks all non-vendor runtime modules under src/.
- Keep changes scoped.
- Run checks/tests before opening a PR.
- Include a concise behavior-change summary.
- Add/update message keys in
_locales/en/messages.json. - Add corresponding translations in
_locales/nb/messages.json(or other locale folder). - Use translation lookups in runtime code via
src/i18n.js(t(key, fallback, substitutions)). - Keep sensible fallback strings in code so missing keys do not break UX.
- Run
npm run check,npm run typecheck, andnpm test.
The SQLite backend is normalized for both analytical queries and AI-agent retrieval.
Core principles:
- One
documentper source URL. - One
captureper save event. - Multiple
chunksper capture/document (document chunks, highlights, notes, transcript segments). - Graph tables (
entities,edges,provenance) enable relation modeling with evidence links back to chunks. - Optional derived tables (
annotations,transcript_segments) make common querying cheaper without losing the chunk-centric model.
meta: DB metadata (db_schema_version, migration/backfill flags, FTS availability, maintenance timestamps/mode/result).documents: canonical source objects (url, title/site/language/published metadata).captures: immutable-ish capture events (hashes, transcript payload, diagnostics, optional legacy payload).chunks: agent-ready text units with offsets/type/provenance anchor.annotations: structured highlights/notes derived from capture annotations.transcript_segments: structured transcript lines with stable segment ordering.entities: graph nodes (document, capture, site, annotation, future extracted entities).entity_aliases: alternate labels/mentions for entities.edges: graph relations (predicate,confidence, optional properties).provenance: evidence links from entity/edge subjects back tochunks.chunks_fts(optional virtual table): full-text index when FTS5 is available.captures_legacy(optional): preserved legacy table after auto-migration.
erDiagram
META {
text key PK
text value
}
DOCUMENTS {
text document_id PK
text url UK
text canonical_url
text title
text site
text language
text published_at
text metadata_json
text created_at
}
CAPTURES {
text capture_id PK
text document_id FK
text capture_type
text saved_at
text content_hash
text document_text
int document_text_word_count
int document_text_character_count
text document_text_compressed_json
text transcript_text
text transcript_segments_json
text diagnostics_json
text legacy_record_json
}
CHUNKS {
text chunk_id PK
text document_id FK
text capture_id FK
text chunk_type
text text
text comment
text created_at
int start_offset
int end_offset
text source_hash
int is_preview
}
ANNOTATIONS {
text annotation_id PK
text document_id FK
text capture_id FK
text annotation_type
text selected_text
text comment
text created_at
int start_offset
int end_offset
}
TRANSCRIPT_SEGMENTS {
text segment_id PK
text document_id FK
text capture_id FK
int segment_index
text timestamp_text
text text
text created_at
}
ENTITIES {
text entity_id PK
text entity_type
text name
text canonical_id
text properties_json
text created_at
}
ENTITY_ALIASES {
text entity_id FK
text alias
}
EDGES {
text edge_id PK
text from_entity_id FK
text to_entity_id FK
text predicate
real confidence
text properties_json
text created_at
}
PROVENANCE {
text provenance_id PK
text subject_type
text subject_id
text chunk_id FK
text evidence_text
text created_at
}
CHUNKS_FTS {
text chunk_id
text text
text comment
}
DOCUMENTS ||--o{ CAPTURES : has
DOCUMENTS ||--o{ CHUNKS : contains
CAPTURES ||--o{ CHUNKS : produces
DOCUMENTS ||--o{ ANNOTATIONS : has
CAPTURES ||--o{ ANNOTATIONS : includes
DOCUMENTS ||--o{ TRANSCRIPT_SEGMENTS : has
CAPTURES ||--o{ TRANSCRIPT_SEGMENTS : includes
ENTITIES ||--o{ ENTITY_ALIASES : has
ENTITIES ||--o{ EDGES : connects
CHUNKS ||--o{ PROVENANCE : evidences
CHUNKS ||--o{ CHUNKS_FTS : indexed_as
Note: edges.from_entity_id and edges.to_entity_id both reference entities.entity_id. Also, provenance.subject_id is polymorphic (entity_id or edge_id) based on subject_type, so that relation is enforced in application logic rather than a single SQL foreign key.
This project is licensed under the MIT License. See LICENCE.md.

