phasespace-labs
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎ADR-004-event-driven-consolidation.md‎
Lines changed: 340 additions & 0 deletions b/‎ADR-004-event-driven-consolidation.md‎
Lines changed: 340 additions & 0 deletions
diff --git a/‎ADR-005-debounced-reflection-executor.md‎
Lines changed: 113 additions & 0 deletions b/‎ADR-005-debounced-reflection-executor.md‎
Lines changed: 113 additions & 0 deletions
diff --git a/‎ADR-006-on-read-reconsolidation.md‎
Lines changed: 102 additions & 0 deletions b/‎ADR-006-on-read-reconsolidation.md‎
Lines changed: 102 additions & 0 deletions
diff --git a/‎ADR-007-access-metadata-and-decay.md‎
Lines changed: 127 additions & 0 deletions b/‎ADR-007-access-metadata-and-decay.md‎
Lines changed: 127 additions & 0 deletions
@@ -41,6 +41,7 @@ plugin/dist/
 palinode.egg-info/
 __pycache__/
 *.pyc
+*.cover
 venv/
 
 # Server logs
 
@@ -0,0 +1,113 @@
+# ADR-005: Debounced Reflection Executor (Tier 2b)
+
+**Status:** Planned — not yet drafted
+**Depends on:** ADR-004 shipped and validated in production for at least 2 weeks
+**Related:** Consolidation improvements
+
+## What this ADR will decide
+
+Whether to build a background "reflection executor" that coalesces write-time contradiction checks into batched reflection passes, fired by a debounce threshold rather than a wall clock.
+
+Pattern reference: LangMem's `ReflectionExecutor` — fire on N pending events OR K minutes idle, whichever first, reset the timer on new events.
+
+## What this ADR will NOT decide
+
+- Whether to remove the write-time synchronous path from ADR-004. That stays regardless.
+- On-read reconsolidation (ADR-006)
+- Access metadata schema (ADR-007)
+
+## Prerequisites before drafting
+
+- [ ] ADR-004 (tier 2a) is shipped and has been running in production for ≥ 2 weeks
+- [ ] We have real data on write-time check latency under typical load
+- [ ] We have real data on LLM call volume and cost under typical load
+- [ ] We have real data on contradiction-check accuracy (false positives, missed contradictions)
+- [ ] We know whether the async queue from ADR-004 is actually getting backed up, or whether per-save checks are fast enough to not need coalescing
+
+## Questions to think through when writing
+
+### Is this actually needed?
+
+**The whole premise of ADR-005 is that write-time checks need to be coalesced.** If real-world data from ADR-004 shows per-save LLM calls are cheap enough and fast enough that no coalescing is needed, ADR-005 doesn't exist — we just let the tier 2a queue drain naturally and we're done.
+
+Write the first section of ADR-005 as "when ADR-005 should NOT be written": include the concrete metrics (p50 check latency, queue depth, LLM call cost) that would make ADR-005 unnecessary. If those metrics are met by ADR-004 alone, close this ADR with status: "not needed."
+
+### What's the trigger function?
+
+Options to consider:
+- **Count-based:** fire after N pending checks accumulate
+- **Time-based:** fire K minutes after the most recent save
+- **Entity-based:** fire when the same entity file has been touched N times
+- **Weight-based:** each save contributes "pending weight" based on type/importance, fire at threshold
+- **Hybrid:** count OR time, whichever first (LangMem's choice)
+
+Don't pick one until we have real data from ADR-004 about which signal is actually predictive of "this is a good time to consolidate."
+
+### What does "scoped to touched entities" mean?
+
+v1 draft said "reflection pass scoped to touched entities." Concretely:
+- If 5 facts about `project/palinode` arrive in 15 minutes, the reflection pass only looks at `projects/palinode.md` and `projects/palinode-status.md`?
+- Or does it also pull in related entities via graph walk?
+- How do we compute the scope without scanning the whole store?
+- Does the scope include facts that share vector similarity even if they don't share an entity ref?
+
+### Where does the pending-state live?
+
+Options:
+- **In-process queue** (in the API server, like ADR-004's write-time queue)
+- **SQLite table** (survives restarts, queryable for dashboards)
+- **File-based markers** (matches ADR-004's CLI/plugin path, disk-durable)
+
+The answer probably depends on whether ADR-004's asyncio queue has been causing problems. If it has, ADR-005 might need a more durable store.
+
+### What happens to the nightly cron?
+
+This is the decision that belongs in ADR-005, not ADR-004. Options:
+- **Remove nightly cron entirely when tier 2b ships** — cleanest, but loses a fallback for very idle systems
+- **Keep nightly cron as "catch-up" for entities that didn't cross the debounce threshold** — hybrid, might be redundant
+- **Re-scope nightly to "sanity check only"** — fire once a day, emit a warning if any entity has unconsolidated facts older than 24h, don't actually do the LLM work
+
+My current lean (not a decision): remove entirely. If an entity doesn't hit the debounce threshold in a day, it probably doesn't need consolidation. But wait for real data.
+
+### What happens to the weekly cron?
+
+This is also an ADR-005 decision, not ADR-004. Options:
+- **Keep weekly as "global view" safety net** for MERGE and ARCHIVE operations that need the full corpus
+- **Remove weekly entirely** — if tier 2b fires on all touched entities, weekly is redundant
+- **Re-scope weekly to "storage maintenance only"** — archive old daily notes, compact SQLite, run the linter, but don't do any LLM-driven consolidation
+
+The weakness of v1's "keep weekly as safety net" was that I couldn't justify it on concrete grounds. Before ADR-005 ships, I need a concrete answer to "what specific operation does weekly do that tier 2b can't?"
+
+### Failure modes to cover
+
+- Queue daemon dies — how do pending reflections resume?
+- LLM down for an extended period — what's the backlog policy?
+- Debounce trigger fires during an already-running reflection pass — does it queue or drop?
+- Entity gets N contradicting facts in 30 seconds — does debounce wait or fire immediately?
+- Two separate reflection passes touch the same file — lock? serialize? last-write-wins?
+
+### What observability do we need?
+
+- Queue depth metric
+- Reflection pass latency metric
+- Op-counts per pass (how much work is each firing actually doing?)
+- LLM call count + token count per day
+- Contradiction-catch rate (how often does a reflection pass actually find something to UPDATE/SUPERSEDE?)
+
+If the catch rate is low (most passes return NOOP), the whole tier is overengineered and we should stop.
+
+## Draft scope when written
+
+Single decision: whether and how to coalesce write-time contradiction checks into debounced reflection passes. Includes the concrete decisions about nightly and weekly cron fate because those are downstream of this one.
+
+Explicitly out of scope: on-read reconsolidation (ADR-006), access metadata schema (ADR-007).
+
+## Revisit trigger
+
+Draft this ADR when any of the following is true:
+- ADR-004 has been in production ≥ 2 weeks AND write-time check latency p95 > 3s
+- OR ADR-004's queue has backed up past 500 pending jobs at any point
+- OR LLM call cost from ADR-004 exceeds acceptable budget
+- OR contradiction-catch rate from ADR-004 is high enough that coalescing would meaningfully reduce LLM calls
+
+Otherwise: leave as planned, don't force it.
@@ -0,0 +1,102 @@
+# ADR-006: On-Read Reconsolidation (Tier 4)
+
+**Status:** Speculative — not yet drafted, may never be
+**Depends on:** ADR-004 shipped; ADR-007 (access metadata) shipped
+**Related:** Consolidation improvements
+
+## What this ADR would decide
+
+Whether to trigger a targeted consolidation pass on a specific memory file when that file is retrieved and then followed within some window by a save that contradicts or extends the retrieved content.
+
+Biological analog: reconsolidation (Nader, Schafe & LeDoux 2000) — a retrieved memory becomes labile and requires re-stabilization when prediction error or novel information is present.
+
+## Why this ADR is marked speculative
+
+When I first wrote about this tier in ADR-004-v1, I was seduced by the biological analogy and underestimated how vague the implementation specifics are. Cold-eye review: every piece of "when a retrieved memory is followed within N minutes by a contradicting save, trigger a re-stabilization pass" is its own decision I haven't made.
+
+This ADR may never be written. It's here so we don't lose the idea, not because we're committed to building it.
+
+## Prerequisites before drafting
+
+- [ ] ADR-004 (tier 2a) shipped and validated
+- [ ] ADR-007 (access metadata) shipped — we need to know which files have been recently retrieved before we can trigger anything on retrieval
+- [ ] Real data on the typical "read-then-save" pattern — how often does it happen? what's the typical gap?
+- [ ] Real data on how often write-time dedup from ADR-004 *already* catches the contradictions this tier would catch (if write-time dedup catches them all, this tier is redundant)
+
+## Questions to think through when writing
+
+### Is this distinguishable from tier 2a in practice?
+
+The biggest threat to ADR-006 being useful: **maybe write-time dedup from ADR-004 already does the work.** When you save a fact, `_check_contradictions` looks at the top-k similar existing memories. If the memory you just retrieved is semantically close to the memory you're saving, it'll show up in the top-k regardless of whether you retrieved it. So the contradiction gets caught at write time anyway.
+
+The scenario where tier 4 adds value: you retrieved a memory, and the memory you're about to save is semantically *different* from the retrieved one but contradicts it anyway. Is this even a real scenario? Example:
+- You retrieve `decisions/api-rest-vs-graphql.md` (about REST)
+- You save `insights/graphql-subscriptions-are-actually-fine.md` (not semantically identical, but contradicts the earlier decision)
+- Vector similarity might not surface the REST decision in top-k for the new GraphQL insight
+- But the chronological read-then-save pattern would
+
+Before drafting ADR-006, check whether this scenario actually happens in real usage data, or whether it's a theoretical concern.
+
+### What counts as a "read"?
+
+Options:
+- MCP tool calls (`palinode_read`, `palinode_search`)
+- API `GET /files/{path}` calls
+- CLI `palinode read` / `palinode search`
+- Automatic context injection (core memory, top-k search results)
+- File watcher detecting a manual `cat` or editor open
+
+Each of these is a different signal strength. Automatic injection is much noisier than an explicit tool call — every turn injects core memory, and we probably don't want every subsequent save to trigger a reconsolidation on every core file.
+
+### What counts as "contradicts or extends"?
+
+This is an LLM judgment call. Options:
+- Re-run `_check_contradictions` but with the retrieved file forced into the top-k regardless of similarity
+- Add a separate LLM call that asks "does the new save contradict the retrieved file specifically?"
+- Heuristic: if the new save mentions any entity or keyword from the retrieved file, treat it as potentially contradicting
+
+Each option adds LLM calls. Measure the cost before committing.
+
+### What's the window?
+
+- Minutes (N=5, N=15, N=60)?
+- Turns (next N agent turns after the read)?
+- Session (same Claude Code session)?
+- No window (any save while the read is in "recently retrieved" metadata triggers)?
+
+### Does this compose with tier 2a?
+
+If a read triggers a write-time contradiction check against the retrieved file, and the save also triggers its own write-time contradiction check (ADR-004), does the system do two LLM calls or one? If two, that's wasteful. If one, which one takes precedence?
+
+### What happens for pure-read sessions?
+
+If a user reads 20 files and never saves anything, nothing triggers. No cost. But then what's the value of tracking access metadata at all? Access metadata has value beyond this tier (ranker boost, decay), but ADR-006 specifically doesn't need it unless there's a save.
+
+### Failure modes
+
+- Retrieved file is deleted between read and save
+- Retrieved file is already being consolidated by tier 2b
+- User retrieves 50 files, saves 1 — check all 50 or just some subset?
+- LLM hallucinates a contradiction that doesn't exist → bad UPDATE gets applied
+
+The last one is the scariest. Without a human in the loop, a confidently wrong LLM can corrupt memory. ADR-006 needs a safety mechanism (low-confidence flag? dry-run only? surface to human for approval?) before shipping.
+
+## Draft scope when written
+
+This ADR is narrow: does retrieval + subsequent save trigger a targeted consolidation pass, yes or no, and if yes, how.
+
+Out of scope: access metadata schema (ADR-007, which is a prerequisite), ranker weighting of access metadata (ADR-007), decay based on retrieval count (ADR-007).
+
+## Revisit trigger
+
+Draft this ADR only if **all** of the following are true:
+- ADR-004 and ADR-007 are shipped
+- Real-world data shows that the "read-then-contradicting-save" pattern happens often enough to matter
+- Real-world data shows that write-time dedup (tier 2a) *isn't* already catching those contradictions
+- We've designed a safety mechanism against LLM-hallucinated contradictions
+
+If any of these is missing, leave this ADR as speculative and don't write it.
+
+## Possible outcome
+
+It is likely that this ADR gets closed as "not needed" after ADR-004 ships and we see that tier 2a already catches the contradictions that motivated this design. That's a fine outcome. Mark it "not needed" and move on.
@@ -0,0 +1,127 @@
+# ADR-007: Access Metadata Schema + DecayConfig Enable
+
+**Status:** Planned — not yet drafted
+**Depends on:** ADR-004 shipped
+**Related:** Consolidation improvements
+
+## What this ADR will decide
+
+Two coupled decisions:
+1. **Schema for access metadata** — how to track `last_retrieved_at`, `retrieval_count`, and possibly `last_modified_at` per memory file.
+2. **Enable `DecayConfig`** — turn on the existing (but disabled) decay configuration in `palinode/core/config.py` and use its tau values as a ranker term.
+
+These are in the same ADR because the decay ranker term can't exist without access metadata to feed it.
+
+## What this ADR will NOT decide
+
+- Whether to delete files based on decay. Nothing gets deleted. Decay is a ranker weight only.
+- Whether to compact / archive based on decay. Separate decision (probably ADR-005 territory).
+- On-read reconsolidation (ADR-006) — although access metadata is a prerequisite for ADR-006, the decision of *whether* to build ADR-006 is independent of this one.
+
+## Prerequisites before drafting
+
+- [ ] ADR-004 shipped (write-time contradiction check) — this ADR touches the same save path
+- [ ] Real data on how often files are actually retrieved in a typical week (so tau tuning has ground truth)
+- [ ] Real data on retrieval distribution — long-tailed or uniform? Informs whether decay actually matters
+
+## Questions to think through when writing
+
+### Where does access metadata live?
+
+**Three-way tension:**
+
+1. **Frontmatter in the file itself.** Source of truth, git-diffable, survives everything. But: every retrieval causes a file write, which causes a git commit, which causes a watcher reindex, which is expensive and noisy.
+
+2. **SQLite table in `.palinode.db`.** Fast to update, fast to query from the ranker. But: not part of the source-of-truth markdown, can drift from file state, lost if the db is rebuilt.
+
+3. **Hybrid: SQLite is live, periodic flush to frontmatter.** Best of both, but adds a "flush" operation that has its own failure modes.
+
+My current lean (not a decision): **SQLite only for live access metadata, no frontmatter persistence.** Rationale: access metadata is ephemeral by nature — it's a signal about what's hot *right now*. If the db is rebuilt, the ranker resets to "everything is equally cold," which is a fine reset state. Not losing information you can't reconstruct.
+
+But: if ADR-006 ever wants to use retrieval history ("this file was retrieved 10 times last week, so it's important"), SQLite-only breaks that. Worth weighing.
+
+### What fields does the schema need?
+
+Minimum:
+- `file_path` (PK)
+- `last_retrieved_at` (timestamp)
+- `retrieval_count` (integer, monotonic)
+
+Possibly:
+- `last_retrieved_by` (session id? user id? source surface like "claude-code"?)
+- `retrieval_count_7d` / `retrieval_count_30d` (decay-window counts, maintained by a sweeper)
+- `last_modified_at` (file mtime at last index)
+- `contradiction_check_count` (how many times tier 2a has run on this file)
+
+Don't add fields speculatively. Add only what the ranker or a future ADR concretely needs.
+
+### What's a "retrieval"?
+
+Same question as ADR-006 (see that ADR for the option list). The choice here matters for decay:
+- If automatic context injection counts as a retrieval, every file in core memory gets bumped on every turn → "core" never decays → access metadata is useless as a ranker signal
+- If only explicit tool calls count, decay has signal but core files look stale → the ranker might downrank core files, which is wrong
+
+**Likely answer:** access metadata distinguishes *explicit* reads (tool calls) from *passive* injection. Only explicit reads update `retrieval_count`. Passive injection updates `last_injected_at` separately, or not at all.
+
+This is a schema-design decision that has to be made in ADR-007.
+
+### Is decay actually useful?
+
+The existing `DecayConfig` has tau values:
+- `tau_critical: 180` days
+- `tau_decisions: 60`
+- `tau_insights: 90`
+- `tau_general: 30`
+- `tau_status: 7`
+- `tau_ephemeral: 1`
+
+These are guesses from before any data existed. Before this ADR can ship a "use decay in the ranker" decision, we need:
+- Evidence that unused memories are actually hurting the ranker (e.g., stale status files surfacing in search results ahead of current ones)
+- Empirical tau values, tuned from real retrieval patterns
+
+If neither of those is in hand, the ADR should narrow to **"add access metadata schema only, defer decay enabling."** Half an ADR.
+
+### How does this interact with the ranker?
+
+Current ranker (`palinode/core/store.py`) uses hybrid BM25 + vector similarity fused via RRF. Adding a decay term means:
+- Either multiply the existing RRF score by a decay factor (e.g., `score * exp(-age / tau)`)
+- Or add decay as a third signal in the fusion
+
+The ADR needs to pick one and justify it. Multiplication is simpler but makes decay dominate when age is large; fusion is more principled but introduces a tuning parameter.
+
+### How does this interact with the watcher?
+
+The watcher daemon reindexes on file modify. If access metadata is in frontmatter, every retrieval triggers a file modify, which triggers a reindex, which could create a feedback loop.
+
+If access metadata is SQLite-only (my current lean), the watcher doesn't need to know about it at all — it stays focused on content indexing.
+
+### Failure modes
+
+- SQLite table corrupts → ranker loses decay signal, search still works (falls back to BM25+vector only)
+- Access metadata gets out of sync with actual file state → stale last_retrieved_at, but self-corrects on next retrieval
+- Ranker heavily favors recently-retrieved files → recency bias → user complains about "why does it keep showing me the thing I just read?" Need a brake.
+
+### What telemetry do we need to tune tau?
+
+- Histogram of retrieval intervals per file (time between retrievals)
+- Distribution of retrieval_count per file at end of a week
+- Correlation between "last retrieved > X days ago" and "never retrieved again" (predictive value of decay)
+
+Without telemetry, tuning tau is guessing. The ADR should ship telemetry alongside the feature.
+
+## Draft scope when written
+
+Single coupled decision: access metadata schema + enable decay as ranker term. The two are bundled because one can't ship without the other (decay needs metadata to read from).
+
+Possible split: ship access metadata schema alone first (ADR-007a), then decide decay enabling separately (ADR-007b) after the metadata has been collecting data for a few weeks. This is probably the right move if real data isn't available at draft time.
+
+Explicitly out of scope: on-read reconsolidation (ADR-006), decay-based deletion (never — not a thing we do), decay-based archival (ADR-005 territory).
+
+## Revisit trigger
+
+Draft this ADR when:
+- ADR-004 has been in production long enough that we can instrument retrieval
+- OR we hit a concrete ranker problem that access metadata would solve (stale results outranking fresh ones in user-visible search output)
+- OR we decide we want ADR-006 and need access metadata as a prerequisite
+
+No rush. This ADR is valuable but not urgent.