Skip to content

Commit b5a9ffb

Browse files
feat(graph): knowledge graph subpackage with dual-backend support
Add qp_vault.graph subpackage: GraphEngine facade, GraphStorageBackend protocol, PostgreSQL (pg_trgm + recursive CTEs) and SQLite (FTS5 + Python BFS) implementations. Intelligence services: KnowledgeExtractor, EntityResolver, EntityDetector, EntityMaterializer, WikilinkResolver. Membrane sanitization for LLM extraction. Every mutation fires VaultEvents for capsule audit. Graph-augmented search via graph_boost parameter. Comprehensive input validation: name/type/relation length caps, properties size cap (50KB), tag limits, weight bounds, null byte stripping, direction enum, self-edge/self-merge rejection, limit capping, graph_schema SQL identifier validation. 177 graph tests + 63 security tests. 1048 total tests passing. New files: - src/qp_vault/graph/ (8 modules) - src/qp_vault/membrane/sanitize.py - tests/test_graph_*.py (8 test files) - docs/knowledge-graph.md Modified: - enums.py (10 graph EventType values) - protocols.py (GraphStorageBackend) - storage/postgres.py (graph DDL, CTE, 20 methods, graph_schema) - storage/sqlite.py (graph DDL, FTS5, BFS, 20 methods) - vault.py (vault.graph property, graph_boost search) - __init__.py (graph exports) - pyproject.toml (graph extra) - docs/index.md, architecture.md, api-reference.md, getting-started.md Made-with: Cursor
1 parent 8b93c81 commit b5a9ffb

29 files changed

+6680
-4
lines changed

docs/api-reference.md

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -231,12 +231,15 @@ vault.search(
231231
as_of: date | None = None, # Point-in-time
232232
deduplicate: bool = True, # One result per resource
233233
explain: bool = False, # Include scoring breakdown
234+
graph_boost: bool = False, # Boost docs mentioning detected entities
234235
) -> list[SearchResult]
235236
```
236237

237238
When no embedder is configured, search automatically falls back to text-only mode (`vector_weight=0.0`, `text_weight=1.0`). This ensures search works on day one without requiring an embedding model.
238239

239-
<!-- VERIFIED: vault.py:1051-1063 — text-only fallback -->
240+
When `graph_boost=True` and `vault.graph` is available, search detects entities in the query text, fetches their backlinks, and applies a 15% relevance boost to documents that mention those entities. Off by default. Best-effort: any failure falls back to standard search.
241+
242+
<!-- VERIFIED: vault.py:1058-1075, 1172-1188 -->
240243

241244
### search_with_facets()
242245

@@ -430,6 +433,52 @@ vault.import_vault(path: str | Path) -> list[Resource]
430433

431434
---
432435

436+
## Knowledge Graph
437+
438+
Access via `vault.graph`. Returns `GraphEngine` when the storage backend supports graphs, `None` otherwise.
439+
440+
```python
441+
vault.graph -> GraphEngine | None
442+
```
443+
444+
Full documentation: [Knowledge Graph Guide](knowledge-graph.md)
445+
446+
Quick reference:
447+
448+
```python
449+
# Nodes
450+
node = await vault.graph.create_node(name="Alice", entity_type="person")
451+
node = await vault.graph.get_node(node_id)
452+
nodes, total = await vault.graph.list_nodes(entity_type="person", limit=20)
453+
results = await vault.graph.search_nodes("Alice")
454+
updated = await vault.graph.update_node(node_id, name="Alice Smith")
455+
await vault.graph.delete_node(node_id)
456+
457+
# Edges
458+
edge = await vault.graph.create_edge(source_id=a.id, target_id=b.id, relation_type="knows")
459+
edges = await vault.graph.get_edges(node_id, direction="outgoing")
460+
await vault.graph.delete_edge(edge_id)
461+
462+
# Traversal + context
463+
neighbors = await vault.graph.neighbors(node_id, depth=2)
464+
context = await vault.graph.context_for([node_id])
465+
466+
# Mentions
467+
await vault.graph.track_mention(node_id, resource_id, context_snippet="...")
468+
backlinks = await vault.graph.get_backlinks(node_id)
469+
470+
# Cross-space + merge
471+
await vault.graph.add_to_space(node_id, space_id)
472+
merged = await vault.graph.merge_nodes(keep_id, merge_id)
473+
474+
# Scan
475+
job = await vault.graph.scan(space_id)
476+
```
477+
478+
<!-- VERIFIED: vault.py:253-257, graph/service.py:53-63 -->
479+
480+
---
481+
433482
## Status
434483

435484
### status()

docs/architecture.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,20 @@ src/qp_vault/
3030
sqlite.py SQLite + FTS5 (default, zero-config)
3131
postgres.py PostgreSQL + pgvector + pg_trgm (production)
3232
33+
graph/
34+
models.py GraphNode, GraphEdge, GraphMention, NeighborResult, etc.
35+
service.py GraphEngine: CRUD, traversal, merge, scan, audit events
36+
extraction.py KnowledgeExtractor: LLM-based entity/relationship extraction
37+
resolution.py EntityResolver: dedup cascade (exact + search + create)
38+
detection.py EntityDetector: in-memory name matching (no LLM)
39+
materialization.py EntityMaterializer: profile.md + manifest.json generation
40+
wikilinks.py WikilinkResolver: parse + resolve [[Entity Name]] syntax
41+
3342
membrane/
3443
pipeline.py MembranePipeline: multi-stage content screening
3544
innate_scan.py Pattern-based detection (regex blocklists)
3645
release_gate.py Risk-proportionate gating decision
46+
sanitize.py Extraction-time input sanitization for LLM prompts
3747
3848
encryption/
3949
aes_gcm.py AES-256-GCM symmetric (FIPS 197)

docs/getting-started.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,30 @@ vault.search("query", tenant_id="site-123")
152152
vault = Vault("./knowledge", tenant_id="site-123")
153153
```
154154

155+
## Knowledge Graph
156+
157+
Track entities, relationships, and mentions across your vault:
158+
159+
```python
160+
# Create entities
161+
alice = vault.graph.create_node(name="Alice", entity_type="person")
162+
acme = vault.graph.create_node(name="Acme Corp", entity_type="company")
163+
164+
# Connect them
165+
vault.graph.create_edge(source_id=alice.id, target_id=acme.id, relation_type="works_at")
166+
167+
# Track mentions in documents
168+
resource = vault.add("Alice leads engineering at Acme Corp.", name="team.md")
169+
vault.graph.track_mention(alice.id, resource.id, context_snippet="Alice leads engineering")
170+
171+
# Search and traverse
172+
results = vault.graph.search_nodes("Alice")
173+
neighbors = vault.graph.neighbors(alice.id, depth=2)
174+
backlinks = vault.graph.get_backlinks(alice.id)
175+
```
176+
177+
Works on both PostgreSQL and SQLite. See [Knowledge Graph](knowledge-graph.md) for extraction, detection, and wikilinks.
178+
155179
## CLI
156180

157181
```bash
@@ -170,6 +194,7 @@ vault status
170194

171195
## Next Steps
172196

197+
- [Knowledge Graph](knowledge-graph.md): Entities, relationships, extraction, detection
173198
- [Trust Tiers](trust-tiers.md): How trust affects search ranking
174199
- [Encryption](encryption.md): AES-256-GCM, ML-KEM-768, ML-DSA-65
175200
- [RBAC](rbac.md): Reader/Writer/Admin roles

docs/index.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ Governed knowledge store for autonomous organizations. Every fact has provenance
99
| [Getting Started](getting-started.md) | Install, first vault, add, search, grep, verify in 5 minutes |
1010
| [Architecture](architecture.md) | Package structure, layers, data flow, Protocol interfaces |
1111
| [API Reference](api-reference.md) | Complete Python SDK: Vault, AsyncVault, all methods |
12+
| [Knowledge Graph](knowledge-graph.md) | Entities, relationships, mentions, traversal, extraction, detection |
1213
| [Trust Tiers](trust-tiers.md) | CANONICAL, WORKING, EPHEMERAL, ARCHIVED and search weighting |
1314
| [Knowledge Lifecycle](lifecycle.md) | State machine, supersession chains, temporal validity |
1415
| [Memory Layers](memory-layers.md) | OPERATIONAL, STRATEGIC, COMPLIANCE with per-layer defaults |
@@ -39,7 +40,8 @@ print(results[0].content, results[0].trust_tier)
3940
## Installation
4041

4142
```bash
42-
pip install qp-vault # SQLite, basic search, trust tiers
43+
pip install qp-vault # SQLite, basic search, trust tiers, knowledge graph
44+
pip install qp-vault[postgres] # + PostgreSQL + pgvector + pg_trgm
4345
pip install qp-vault[encryption] # + AES-256-GCM
4446
pip install qp-vault[pq] # + ML-KEM-768, ML-DSA-65
4547
pip install qp-vault[all] # Everything

0 commit comments

Comments
 (0)