Skip to content

Commit 3af2d08

Browse files
feat(vault): v1.4.0 world-class grep engine
Single-pass FTS5 OR query (SQLite) and ILIKE+trigram query (Postgres) replaces the previous N+1 per-keyword search loop. Three-signal blended scoring: keyword coverage (Lucene coord factor as multiplier), native text rank (FTS5 bm25 / pg_trgm), and term proximity (cover density ranking). Includes snippet highlighting, explain_metadata with full scoring breakdown, trust weighting, SURVEIL re-evaluation, telemetry, and query timeout protection. New: StorageBackend.grep() protocol, grep_utils.py shared utilities, GrepMatch dataclass, VaultConfig grep scoring weights. Fixed: encryption test skip guards across test_v1_features, test_encryption, and test_coverage_gaps for missing [encryption] extra. 799 tests passing, 62 new grep tests, zero regressions.
1 parent d822f75 commit 3af2d08

22 files changed

+1242
-79
lines changed

CHANGELOG.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,28 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [1.4.0] - 2026-04-08
11+
12+
### Added
13+
- **World-class grep engine**: Complete rewrite of `AsyncVault.grep()` with three-signal blended scoring
14+
- **Single-pass FTS5 OR query** (SQLite): one database round-trip regardless of keyword count, replacing the previous N+1 per-keyword search loop
15+
- **Single-pass ILIKE + trigram query** (PostgreSQL): per-keyword CASE expressions with `GREATEST(similarity(...))` scoring
16+
- **Three-signal scoring**: keyword coverage (Lucene coord factor as multiplier), native text rank (FTS5 bm25 / pg_trgm), term proximity (cover density ranking)
17+
- **Keyword highlighting**: `explain_metadata.snippet` with configurable markers
18+
- **Scoring breakdown**: `explain_metadata` includes `matched_keywords`, `hit_density`, `text_rank`, `proximity`, and `snippet`
19+
- `StorageBackend.grep()` protocol method: dedicated storage-level grep for both SQLite and PostgreSQL backends
20+
- `grep_utils.py`: shared utilities for FTS5 query building, keyword sanitization, snippet generation, keyword matching, and proximity scoring
21+
- `GrepMatch` dataclass: lightweight intermediate result type for storage-to-vault layer communication
22+
- `VaultConfig.grep_rank_weight` and `VaultConfig.grep_proximity_weight`: configurable scoring weights
23+
- 62 new grep tests across `test_grep.py` (51 tests) and `test_grep_utils.py` (31 tests)
24+
25+
### Fixed
26+
- Encryption test skip guards: `test_v1_features.py`, `test_coverage_gaps.py`, `test_encryption.py` now correctly skip when `[encryption]` extra is not installed
27+
28+
### Changed
29+
- Grep scoring formula: `coverage * (rank_weight * text_rank + proximity_weight * proximity)` replaces flat density-only scoring
30+
- Coverage acts as a multiplier (Lucene coord factor pattern): 3/3 keywords = full score, 1/3 = 33% score
31+
1032
## [1.0.0] - 2026-04-07
1133

1234
### Added

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ pip install qp-vault
142142
| `pip install qp-vault[local]` | + Local embeddings (sentence-transformers, air-gap safe) | + sentence-transformers |
143143
| `pip install qp-vault[openai]` | + OpenAI embeddings (cloud) | + openai |
144144
| `pip install qp-vault[integrity]` | + Near-duplicate + contradiction detection | + numpy |
145-
| `pip install qp-vault[fastapi]` | + REST API (22+ endpoints) | + fastapi |
145+
| `pip install qp-vault[fastapi]` | + REST API (30+ endpoints) | + fastapi |
146146
| `pip install qp-vault[cli]` | + `vault` command-line tool (15 commands) | + typer, rich |
147147
| `pip install qp-vault[all]` | Everything | All of the above |
148148

@@ -170,6 +170,9 @@ vault.add("Incident response: acknowledge within 15 minutes...",
170170
# Trust-weighted search (deduplicated, with freshness decay)
171171
results = vault.search("incident response")
172172

173+
# Multi-keyword grep (single-pass FTS, three-signal scoring)
174+
results = vault.grep(["incident", "response", "P0", "escalation"])
175+
173176
# Retrieve full content
174177
text = vault.get_content(results[0].resource_id)
175178

docs/api-reference.md

Lines changed: 117 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# API Reference
22

3-
Complete Python SDK for qp-vault v1.0.0.
3+
Complete Python SDK for qp-vault v1.4.0.
44

55
## Constructor
66

@@ -101,6 +101,24 @@ Reassembles chunks in order to return the full text content. Quarantined resourc
101101

102102
<!-- VERIFIED: vault.py:406-420 -->
103103

104+
### reprocess()
105+
106+
```python
107+
vault.reprocess(resource_id: str) -> Resource
108+
```
109+
110+
Re-chunks and re-embeds an existing resource. Useful when the embedding model changes or chunking parameters are updated. The resource content is preserved; only chunks and embeddings are regenerated.
111+
112+
```python
113+
# After switching embedding models
114+
updated = vault.reprocess(resource.id)
115+
assert updated.status == "indexed"
116+
```
117+
118+
Emits an `UPDATE` subscriber event with `details={"reprocessed": True}`.
119+
120+
<!-- VERIFIED: vault.py:706-770 -->
121+
104122
### list()
105123

106124
```python
@@ -121,6 +139,26 @@ vault.list(
121139

122140
<!-- VERIFIED: vault.py:373-400 -->
123141

142+
### find_by_name()
143+
144+
```python
145+
vault.find_by_name(
146+
name: str,
147+
*,
148+
tenant_id: str | None = None,
149+
collection_id: str | None = None,
150+
) -> Resource | None
151+
```
152+
153+
Case-insensitive name lookup. Returns the first matching non-deleted resource, or `None`.
154+
155+
```python
156+
resource = vault.find_by_name("STRATEGY.md")
157+
# Also matches "strategy.md", "Strategy.MD"
158+
```
159+
160+
<!-- VERIFIED: vault.py:632-668 -->
161+
124162
### update()
125163

126164
```python
@@ -196,7 +234,9 @@ vault.search(
196234
) -> list[SearchResult]
197235
```
198236

199-
<!-- VERIFIED: vault.py:558-648 -->
237+
When no embedder is configured, search automatically falls back to text-only mode (`vector_weight=0.0`, `text_weight=1.0`). This ensures search works on day one without requiring an embedding model.
238+
239+
<!-- VERIFIED: vault.py:1051-1063 — text-only fallback -->
200240

201241
### search_with_facets()
202242

@@ -208,6 +248,37 @@ Returns `{"results": [...], "total": N, "facets": {"trust_tier": {...}, "resourc
208248

209249
<!-- VERIFIED: vault.py:650-687 -->
210250

251+
### grep()
252+
253+
```python
254+
vault.grep(
255+
keywords: list[str],
256+
*,
257+
tenant_id: str | None = None,
258+
top_k: int = 20,
259+
max_keywords: int = 20,
260+
) -> list[SearchResult]
261+
```
262+
263+
Multi-keyword OR search with three-signal blended scoring. Executes a single FTS5 OR query (SQLite) or ILIKE+trigram query (PostgreSQL) regardless of keyword count.
264+
265+
**Scoring formula:** `coverage * (0.7 * text_rank + 0.3 * proximity)` where:
266+
- **Coverage** (Lucene coord factor): `matched_keywords / total_keywords`, applied as a multiplier. 3/3 = full score, 1/3 = 33%.
267+
- **Text rank**: native FTS5 bm25 or pg_trgm similarity (0.0-1.0).
268+
- **Proximity**: how close matched keywords appear to each other within the chunk.
269+
270+
```python
271+
results = vault.grep(["revenue", "Q3", "forecast"])
272+
# Results sorted by blended relevance (coverage * text_rank + proximity)
273+
# explain_metadata includes: matched_keywords, hit_density, text_rank, proximity, snippet
274+
print(results[0].explain_metadata["snippet"])
275+
# "...discussed **Q3** **revenue** **forecast** projections..."
276+
```
277+
278+
No embedder required. Single database query. Results deduplicated by resource and trust-weighted.
279+
280+
<!-- VERIFIED: vault.py:1172-1266 -->
281+
211282
**SearchResult fields:**
212283

213284
| Field | Type | Description |
@@ -369,6 +440,50 @@ vault.status() -> dict[str, Any]
369440

370441
---
371442

443+
## Event Subscription
444+
445+
### subscribe()
446+
447+
```python
448+
vault.subscribe(callback: Callable[[VaultEvent], Any]) -> Callable[[], None]
449+
```
450+
451+
Register a callback for vault mutation events. Returns an unsubscribe function. Callbacks can be sync or async; async callbacks are awaited directly. Errors in callbacks are logged and never propagated to the caller.
452+
453+
```python
454+
from qp_vault import AsyncVault, VaultEvent
455+
456+
vault = AsyncVault("./knowledge")
457+
458+
# Sync callback
459+
def on_change(event: VaultEvent) -> None:
460+
print(f"{event.event_type}: {event.resource_name}")
461+
462+
unsub = vault.subscribe(on_change)
463+
464+
# Add a resource (callback fires with CREATE event)
465+
vault.add("Content", name="doc.md")
466+
467+
# Stop receiving events
468+
unsub()
469+
```
470+
471+
**Events emitted on:**
472+
473+
| Operation | EventType |
474+
|-----------|-----------|
475+
| `add()` | `CREATE` |
476+
| `update()` | `UPDATE` |
477+
| `delete()` | `DELETE` |
478+
| `reprocess()` | `UPDATE` (with `details.reprocessed=True`) |
479+
| `transition()` | `LIFECYCLE_TRANSITION` |
480+
481+
Multiple subscribers are independent. Unsubscribing one does not affect others. Calling `unsub()` twice is safe.
482+
483+
<!-- VERIFIED: vault.py:289-336 — subscribe + _notify_subscribers -->
484+
485+
---
486+
372487
## Plugin Registration
373488

374489
```python

docs/fastapi.md

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# FastAPI Integration
22

3-
qp-vault provides 22+ ready-made REST API endpoints.
3+
qp-vault provides 30+ ready-made REST API endpoints.
44

55
```bash
66
pip install qp-vault[fastapi]
@@ -49,12 +49,14 @@ app.include_router(router, prefix="/v1/vault")
4949
| `GET` | `/resources/{id}/proof` | Export Merkle proof |
5050
| `GET` | `/verify` | Verify entire vault |
5151

52-
### Search
52+
### Search & Retrieval
5353

5454
| Method | Path | Description |
5555
|--------|------|-------------|
5656
| `POST` | `/search` | Trust-weighted hybrid search |
5757
| `POST` | `/search/faceted` | Search with facet counts |
58+
| `POST` | `/grep` | Multi-keyword OR search (hit-density scoring) |
59+
| `GET` | `/resources/by-name` | Find resource by name (case-insensitive) |
5860

5961
### Collections
6062

@@ -63,12 +65,27 @@ app.include_router(router, prefix="/v1/vault")
6365
| `GET` | `/collections` | List collections |
6466
| `POST` | `/collections` | Create collection |
6567

66-
### Batch & Export
68+
### Processing
69+
70+
| Method | Path | Description |
71+
|--------|------|-------------|
72+
| `POST` | `/resources/{id}/reprocess` | Re-chunk and re-embed a resource |
73+
74+
### Batch, Import & Export
6775

6876
| Method | Path | Description |
6977
|--------|------|-------------|
7078
| `POST` | `/batch` | Batch add (max 100 items) |
79+
| `POST` | `/resources/multiple` | Get multiple resources by ID (max 100) |
7180
| `GET` | `/export` | Export vault to JSON |
81+
| `POST` | `/import` | Import resources from export file |
82+
83+
### Adversarial & Diff
84+
85+
| Method | Path | Description |
86+
|--------|------|-------------|
87+
| `PATCH` | `/resources/{id}/adversarial` | Set adversarial verification status |
88+
| `GET` | `/resources/{old_id}/diff/{new_id}` | Unified diff between two resources |
7289

7390
### Intelligence
7491

@@ -78,7 +95,7 @@ app.include_router(router, prefix="/v1/vault")
7895
| `GET` | `/status` | Resource counts and metadata |
7996
| `GET` | `/expiring` | Resources expiring within N days |
8097

81-
<!-- VERIFIED: integrations/fastapi_routes.py:118-310 — all endpoints -->
98+
<!-- VERIFIED: integrations/fastapi_routes.py:118-390 — all endpoints -->
8299

83100
## Input Validation
84101

@@ -94,6 +111,9 @@ All endpoints validate inputs at the API boundary before reaching vault logic.
94111
| `offset` | 0-1,000,000 | `GET /resources` |
95112
| Batch sources | Max 100 items | `POST /batch` |
96113
| `as_of` | Valid ISO date | `POST /search` |
114+
| `keywords` | Max 20 items | `POST /grep` |
115+
| `name` | Max 255 characters | `GET /resources/by-name` |
116+
| `resource_ids` | Max 100 items | `POST /resources/multiple` |
97117

98118
<!-- VERIFIED: integrations/fastapi_routes.py:40 — content max_length -->
99119
<!-- VERIFIED: integrations/fastapi_routes.py:51-53 — SearchRequest validators -->

docs/getting-started.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,22 @@ for r in results:
8080

8181
Results are deduplicated (one per resource), ranked by: `(vector + text) * trust * freshness * layer_boost`.
8282

83+
## Multi-Keyword Grep
84+
85+
```python
86+
# Find documents where multiple concepts converge
87+
results = vault.grep(["incident", "response", "P0", "escalation"])
88+
89+
for r in results:
90+
meta = r.explain_metadata
91+
print(f"{r.resource_name}{len(meta['matched_keywords'])}/{4} keywords matched")
92+
print(f" snippet: {meta['snippet']}")
93+
```
94+
95+
Single-pass FTS5 query. Scored by keyword coverage (coord factor), text relevance, and term proximity. No embedder required.
96+
97+
<!-- VERIFIED: vault.py:1172-1285 — grep method -->
98+
8399
## Retrieve Content
84100

85101
```python

docs/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Governed knowledge store for autonomous organizations. Every fact has provenance
66

77
| Guide | Description |
88
|-------|-------------|
9-
| [Getting Started](getting-started.md) | Install, first vault, add, search, verify in 5 minutes |
9+
| [Getting Started](getting-started.md) | Install, first vault, add, search, grep, verify in 5 minutes |
1010
| [Architecture](architecture.md) | Package structure, layers, data flow, Protocol interfaces |
1111
| [API Reference](api-reference.md) | Complete Python SDK: Vault, AsyncVault, all methods |
1212
| [Trust Tiers](trust-tiers.md) | CANONICAL, WORKING, EPHEMERAL, ARCHIVED and search weighting |
@@ -20,7 +20,7 @@ Governed knowledge store for autonomous organizations. Every fact has provenance
2020
| [Security Model](security.md) | SHA3-256, Merkle trees, input validation, threat model |
2121
| [Streaming & Telemetry](streaming-and-telemetry.md) | Real-time events, operation metrics |
2222
| [CLI Reference](cli.md) | All 15 commands |
23-
| [FastAPI Integration](fastapi.md) | 22+ REST endpoints |
23+
| [FastAPI Integration](fastapi.md) | 30+ REST endpoints |
2424
| [Migration Guide](migration.md) | Breaking changes from v0.x to v1.0 |
2525
| [Deployment Guide](deployment.md) | PostgreSQL, SSL, encryption, production checklist |
2626
| [Troubleshooting](troubleshooting.md) | Error codes (VAULT_000-700), common issues |

docs/streaming-and-telemetry.md

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,50 @@
22

33
Real-time event streaming and operation telemetry for autonomous AI systems.
44

5-
## Event Streaming
5+
## Event Subscription (Recommended)
66

7-
Subscribe to vault mutations in real-time.
7+
The simplest way to react to vault mutations. Register a callback directly on the vault instance:
8+
9+
```python
10+
from qp_vault import AsyncVault, VaultEvent
11+
12+
vault = AsyncVault("./knowledge")
13+
14+
def on_change(event: VaultEvent) -> None:
15+
print(f"{event.event_type}: {event.resource_name}")
16+
if event.event_type.value == "create":
17+
trigger_indexing(event.resource_id)
18+
19+
unsub = vault.subscribe(on_change)
20+
21+
# Every mutation (add, update, delete, transition, reprocess) fires the callback
22+
await vault.add("New document", name="report.md")
23+
# Output: create: report.md
24+
25+
# Stop receiving events
26+
unsub()
27+
```
28+
29+
Async callbacks are also supported:
30+
31+
```python
32+
async def on_change_async(event: VaultEvent) -> None:
33+
await notify_downstream(event)
34+
35+
vault.subscribe(on_change_async)
36+
```
37+
38+
**Key behaviors:**
39+
- Multiple subscribers are independent
40+
- Errors in callbacks are logged, never propagated
41+
- Calling `unsub()` twice is safe (no error)
42+
- Events are delivered synchronously in mutation order
43+
44+
<!-- VERIFIED: vault.py:289-336 — subscribe + _notify_subscribers -->
45+
46+
## Event Streaming (Advanced)
47+
48+
For async-iterator consumption patterns (e.g., WebSocket broadcasting), use `VaultEventStream`:
849

950
```python
1051
from qp_vault import AsyncVault

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "qp-vault"
7-
version = "1.3.0"
7+
version = "1.4.0"
88
description = "Governed knowledge store for autonomous organizations. Trust tiers, cryptographic audit trails, content-addressed storage, air-gap native."
99
readme = "README.md"
1010
license = "Apache-2.0"

src/qp_vault/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
Docs: https://github.com/quantumpipes/vault
2727
"""
2828

29-
__version__ = "1.3.0"
29+
__version__ = "1.4.0"
3030
__author__ = "Quantum Pipes Technologies, LLC"
3131
__license__ = "Apache-2.0"
3232

0 commit comments

Comments
 (0)